In [188]:
from IPython.display import Image
Image(url= "https://s3-us-west-2.amazonaws.com/reference/images/glossary/nenbuninux.jpg")
Out[188]:

The above image is of a stull chart which plots glazes by their UMF of silica and alumina and classifies them into unfused mattes, semimattes, bright, and devitrified glazes.

The UMF refers to the Unity Molecular Formula, which is composed of oxide ratios for various elements in the glaze. In the UMF, the sum of the fluxing oxides RO and R2O are equal to one and all other oxides are adjusted to match. The UMF is useful because the ratios of certain oxides can be used to predict things like firing temperature, surface gloss, and durability.

In [2536]:
Image(url= "http://www.castlehs.com/users/ccozart/images/ConeInKiln.jpg")
Out[2536]:

Image of pyrometric cones softening in a hot kiln

Firing Temperature for glazes is often measured by pyrometric cone, cones that melt when they have reached a certain temperature. the cone system is a logatithmic unit of measurment most artists dont go above cone 14 (1351°C)as it would require a special kiln. To simplify temperature Artists typically break these cones into groups high fire(cones greater than 8) Mid range(between cones 1 and 8) and low fire(below cone 01).

In [2547]:
import numpy as np
import pandas as pd 


import matplotlib.pyplot as plt
import matplotlib.colors
import seaborn as sns

import scipy.stats as stats
import sklearn
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.cluster import DBSCAN
from sklearn.cluster import AgglomerativeClustering
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture

from sklearn import datasets, metrics
import umap

import os
import warnings 

pd.set_option('display.max_rows',1000)
pd.set_option('display.max_columns',500)
pd.set_option('display.width',1000)

warnings.filterwarnings('ignore')
In [ ]:
 
In [2548]:
#import Glazy Dataset 
glaze_df = pd.read_csv('/Users/robertshiles/CSV_files/glazy_data_june_2019.csv')
glaze_df.head(5)
Out[2548]:
id name created_by_user_id material_type_id material_type material_state_id material_state rgb_r rgb_g rgb_b surface_type transparency_type from_orton_cone to_orton_cone is_analysis is_primitive is_theoretical is_private SiO2_percent Al2O3_percent B2O3_percent Li2O_percent K2O_percent Na2O_percent KNaO_percent BeO_percent MgO_percent CaO_percent SrO_percent BaO_percent ZnO_percent PbO_percent P2O5_percent F_percent V2O5_percent Cr2O3_percent MnO_percent MnO2_percent FeO_percent Fe2O3_percent CoO_percent NiO_percent CuO_percent Cu2O_percent CdO_percent TiO2_percent ZrO_percent ZrO2_percent SnO2_percent HfO2_percent Nb2O5_percent Ta2O5_percent MoO3_percent WO3_percent OsO2_percent IrO2_percent PtO2_percent Ag2O_percent Au2O3_percent GeO2_percent As2O3_percent Sb2O3_percent Bi2O3_percent SeO2_percent La2O3_percent CeO2_percent PrO2_percent Pr2O3_percent Nd2O3_percent U3O8_percent Sm2O3_percent Eu2O3_percent Tb2O3_percent Dy2O3_percent Ho2O3_percent Er2O3_percent Tm2O3_percent Yb2O3_percent Lu2O3_percent SiO2_umf Al2O3_umf B2O3_umf Li2O_umf K2O_umf Na2O_umf KNaO_umf BeO_umf MgO_umf CaO_umf SrO_umf BaO_umf ZnO_umf PbO_umf P2O5_umf F_umf V2O5_umf Cr2O3_umf MnO_umf MnO2_umf FeO_umf Fe2O3_umf CoO_umf NiO_umf CuO_umf Cu2O_umf CdO_umf TiO2_umf ZrO_umf ZrO2_umf SnO2_umf HfO2_umf Nb2O5_umf Ta2O5_umf MoO3_umf WO3_umf OsO2_umf IrO2_umf PtO2_umf Ag2O_umf Au2O3_umf GeO2_umf As2O3_umf Sb2O3_umf Bi2O3_umf SeO2_umf La2O3_umf CeO2_umf PrO2_umf Pr2O3_umf Nd2O3_umf U3O8_umf Sm2O3_umf Eu2O3_umf Tb2O3_umf Dy2O3_umf Ho2O3_umf Er2O3_umf Tm2O3_umf Yb2O3_umf Lu2O3_umf SiO2_xumf Al2O3_xumf B2O3_xumf Li2O_xumf K2O_xumf Na2O_xumf KNaO_xumf BeO_xumf MgO_xumf CaO_xumf SrO_xumf BaO_xumf ZnO_xumf PbO_xumf P2O5_xumf F_xumf V2O5_xumf Cr2O3_xumf MnO_xumf MnO2_xumf FeO_xumf Fe2O3_xumf CoO_xumf NiO_xumf CuO_xumf Cu2O_xumf CdO_xumf TiO2_xumf ZrO_xumf ZrO2_xumf SnO2_xumf HfO2_xumf Nb2O5_xumf Ta2O5_xumf MoO3_xumf WO3_xumf OsO2_xumf IrO2_xumf PtO2_xumf Ag2O_xumf Au2O3_xumf GeO2_xumf As2O3_xumf Sb2O3_xumf Bi2O3_xumf SeO2_xumf La2O3_xumf CeO2_xumf PrO2_xumf Pr2O3_xumf Nd2O3_xumf U3O8_xumf Sm2O3_xumf Eu2O3_xumf Tb2O3_xumf Dy2O3_xumf Ho2O3_xumf Er2O3_xumf Tm2O3_xumf Yb2O3_xumf Lu2O3_xumf SiO2_mol Al2O3_mol B2O3_mol Li2O_mol K2O_mol Na2O_mol KNaO_mol BeO_mol MgO_mol CaO_mol SrO_mol BaO_mol ZnO_mol PbO_mol P2O5_mol F_mol V2O5_mol Cr2O3_mol MnO_mol MnO2_mol FeO_mol Fe2O3_mol CoO_mol NiO_mol CuO_mol Cu2O_mol CdO_mol TiO2_mol ZrO_mol ZrO2_mol SnO2_mol HfO2_mol Nb2O5_mol Ta2O5_mol MoO3_mol WO3_mol OsO2_mol IrO2_mol PtO2_mol Ag2O_mol Au2O3_mol GeO2_mol As2O3_mol Sb2O3_mol Bi2O3_mol SeO2_mol La2O3_mol CeO2_mol PrO2_mol Pr2O3_mol Nd2O3_mol U3O8_mol Sm2O3_mol Eu2O3_mol Tb2O3_mol Dy2O3_mol Ho2O3_mol Er2O3_mol Tm2O3_mol Yb2O3_mol Lu2O3_mol SiO2_percent_mol Al2O3_percent_mol B2O3_percent_mol Li2O_percent_mol K2O_percent_mol Na2O_percent_mol KNaO_percent_mol BeO_percent_mol MgO_percent_mol CaO_percent_mol SrO_percent_mol BaO_percent_mol ZnO_percent_mol PbO_percent_mol P2O5_percent_mol F_percent_mol V2O5_percent_mol Cr2O3_percent_mol MnO_percent_mol MnO2_percent_mol FeO_percent_mol Fe2O3_percent_mol CoO_percent_mol NiO_percent_mol CuO_percent_mol Cu2O_percent_mol CdO_percent_mol TiO2_percent_mol ZrO_percent_mol ZrO2_percent_mol SnO2_percent_mol HfO2_percent_mol Nb2O5_percent_mol Ta2O5_percent_mol MoO3_percent_mol WO3_percent_mol OsO2_percent_mol IrO2_percent_mol PtO2_percent_mol Ag2O_percent_mol Au2O3_percent_mol GeO2_percent_mol As2O3_percent_mol Sb2O3_percent_mol Bi2O3_percent_mol SeO2_percent_mol La2O3_percent_mol CeO2_percent_mol PrO2_percent_mol Pr2O3_percent_mol Nd2O3_percent_mol U3O8_percent_mol Sm2O3_percent_mol Eu2O3_percent_mol Tb2O3_percent_mol Dy2O3_percent_mol Ho2O3_percent_mol Er2O3_percent_mol Tm2O3_percent_mol Yb2O3_percent_mol Lu2O3_percent_mol SiO2_Al2O3_ratio_umf R2O_umf RO_umf SiO2_Al2O3_ratio_xumf R2O_xumf RO_xumf loi
0 1 Base Glaze Peltzman 1 460 Glaze 2.0 Production 255.0 255.0 255.0 Glossy Transparent 8 8 0 0 0 0 59.3609 10.7873 0.0000 0.0000 1.9939 3.1374 5.1314 0.0 0.0052 10.8304 0.0 0.0000 5.0000 0.0 0.0130 0.0 0.0 0.0 0.0 0.0 0.0 0.0664 0.0000 0.0 0.0000 0.0 0.0 0.0190 0.0 0.0000 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0260 0.3241 0.0000 0.0000 0.0648 0.1550 0.2199 0.0 0.0004 0.5916 0.0 0.0000 0.1882 0.0 0.0003 0.0 0.0 0.0 0.0 0.0 0.0 0.0013 0.0000 0.0 0.0000 0.0 0.0 0.0007 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0222 0.3236 0.0000 0.0000 0.0648 0.1548 0.2196 0.0 0.0004 0.5908 0.0 0.0000 0.1879 0.0 0.0003 0.0 0.0 0.0 0.0 0.0 0.0 0.0013 0.0000 0.0 0.0000 0.0 0.0 0.0007 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9879 0.1058 0.0000 0.0000 0.0212 0.0506 0.0718 0.0 0.0001 0.1931 0.0 0.0000 0.0614 0.0 0.0001 0.0 0.0 0.0 0.0 0.0 0.0 0.0004 0.0000 0.0 0.0000 0.0 0.0 0.0002 0.0 0.0000 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 69.5261 7.4455 0.0000 0.0000 1.4897 3.5624 5.0521 0.0 0.0090 13.5916 0.0 0.0000 4.3233 0.0 0.0064 0.0 0.0 0.0 0.0 0.0 0.0 0.0292 0.0000 0.0 0.0000 0.0 0.0 0.0168 0.0 0.0000 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 9.3380 0.2199 0.7801 9.3380 0.2196 0.7804 8.7865
1 3 Clay Porcelain Peltzman 1 260 Porcelain 2.0 Production 255.0 255.0 255.0 NaN NaN 8 8 0 0 0 0 63.6878 24.4314 0.0000 0.0000 1.6759 1.0448 2.7207 0.0 0.1577 0.2818 0.0 0.0000 0.0000 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.3759 0.0000 0.0 0.0000 0.0 0.0 0.0197 0.0 0.0000 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 24.3183 5.4974 0.0000 0.0000 0.4082 0.3868 0.7949 0.0 0.0898 0.1153 0.0 0.0000 0.0000 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0540 0.0000 0.0 0.0000 0.0 0.0 0.0057 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 23.0723 5.2157 0.0000 0.0000 0.3873 0.3669 0.7542 0.0 0.0852 0.1094 0.0 0.0000 0.0000 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0512 0.0000 0.0 0.0000 0.0 0.0 0.0054 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0600 0.2396 0.0000 0.0000 0.0178 0.0169 0.0346 0.0 0.0039 0.0050 0.0 0.0000 0.0000 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0024 0.0000 0.0 0.0000 0.0 0.0 0.0002 0.0 0.0000 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 78.7628 17.8052 0.0000 0.0000 1.3221 1.2526 2.5747 0.0 0.2907 0.3734 0.0 0.0000 0.0000 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.1749 0.0000 0.0 0.0000 0.0 0.0 0.0183 0.0 0.0000 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4.4236 0.7949 0.2051 4.4236 0.7542 0.2458 6.3251
2 4 Celadon-type glaze David Pier 1 500 Celadon 2.0 Production NaN NaN NaN Glossy Transparent 9 11 0 0 0 0 66.1209 12.0850 0.0000 0.0000 3.1351 0.8463 3.9814 0.0 0.0600 12.6768 0.0 0.0000 0.0000 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.1678 0.0000 0.0 0.0129 0.0 0.0 0.0058 0.0 0.2694 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4.0091 0.4318 0.0000 0.0000 0.1213 0.0497 0.1710 0.0 0.0054 0.8236 0.0 0.0000 0.0000 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0038 0.0000 0.0 0.0006 0.0 0.0 0.0003 0.0 0.0080 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.9915 0.4299 0.0000 0.0000 0.1207 0.0495 0.1702 0.0 0.0054 0.8200 0.0 0.0000 0.0000 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0038 0.0000 0.0 0.0006 0.0 0.0 0.0003 0.0 0.0079 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.1005 0.1185 0.0000 0.0000 0.0333 0.0137 0.0469 0.0 0.0015 0.2261 0.0 0.0000 0.0000 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0011 0.0000 0.0 0.0002 0.0 0.0 0.0001 0.0 0.0022 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 73.5137 7.9179 0.0000 0.0000 2.2234 0.9122 3.1356 0.0 0.0994 15.1016 0.0 0.0000 0.0000 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0702 0.0000 0.0 0.0108 0.0 0.0 0.0048 0.0 0.1460 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 9.2845 0.1710 0.8290 9.2845 0.1708 0.8292 2.4961
3 5 Pier's Pure Lux-Deluxe Revised 1 470 Clear 2.0 Production 255.0 255.0 255.0 Glossy Transparent 8 9 0 0 0 0 53.4984 9.2907 2.0827 1.0129 0.2954 1.1114 1.4068 0.0 0.0358 0.1243 0.0 21.0677 3.9833 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0837 0.0000 0.0 0.0128 0.0 0.0 0.0035 0.0 0.9372 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.6429 0.3728 0.1224 0.1387 0.0128 0.0734 0.0862 0.0 0.0036 0.0091 0.0 0.5622 0.2002 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0021 0.0000 0.0 0.0007 0.0 0.0 0.0002 0.0 0.0311 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.6327 0.3718 0.1221 0.1383 0.0128 0.0732 0.0860 0.0 0.0036 0.0090 0.0 0.5606 0.1997 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0021 0.0000 0.0 0.0007 0.0 0.0 0.0002 0.0 0.0310 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8904 0.0911 0.0299 0.0339 0.0031 0.0179 0.0211 0.0 0.0009 0.0022 0.0 0.1374 0.0489 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0005 0.0000 0.0 0.0002 0.0 0.0 0.0000 0.0 0.0076 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 70.4323 7.2079 2.3664 2.6814 0.2481 1.4185 1.6666 0.0 0.0702 0.1753 0.0 10.8692 3.8714 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0414 0.0000 0.0 0.0128 0.0 0.0 0.0034 0.0 0.6016 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 9.7715 0.2249 0.7751 9.7715 0.2249 0.7751 4.6680
4 6 Blue Acero 1 750 Blue 2.0 Production 13.0 153.0 186.0 Satin - Matte Semi-opaque 9 11 0 0 0 0 47.5690 17.3426 0.0000 0.0000 5.0019 1.5421 6.5440 0.0 1.2274 11.7201 0.0 0.0000 0.0000 0.0 0.0601 0.0 0.0 0.0 0.0 0.0 0.0 1.8451 0.1214 0.0 0.0000 0.0 0.0 0.0880 0.0 0.0000 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.4940 0.5358 0.0000 0.0000 0.1673 0.0784 0.2457 0.0 0.0959 0.6584 0.0 0.0000 0.0000 0.0 0.0013 0.0 0.0 0.0 0.0 0.0 0.0 0.0364 0.0051 0.0 0.0000 0.0 0.0 0.0035 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.3946 0.5145 0.0000 0.0000 0.1606 0.0753 0.2359 0.0 0.0921 0.6322 0.0 0.0000 0.0000 0.0 0.0013 0.0 0.0 0.0 0.0 0.0 0.0 0.0349 0.0049 0.0 0.0000 0.0 0.0 0.0033 0.0 0.0000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7917 0.1701 0.0000 0.0000 0.0531 0.0249 0.0780 0.0 0.0305 0.2090 0.0 0.0000 0.0000 0.0 0.0004 0.0 0.0 0.0 0.0 0.0 0.0 0.0116 0.0016 0.0 0.0000 0.0 0.0 0.0011 0.0 0.0000 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 61.1857 13.1454 0.0000 0.0000 4.1039 1.9229 6.0268 0.0 2.3536 16.1525 0.0 0.0000 0.0000 0.0 0.0327 0.0 0.0 0.0 0.0 0.0 0.0 0.8930 0.1252 0.0 0.0000 0.0 0.0 0.0851 0.0 0.0000 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4.6546 0.2457 0.7543 4.6546 0.2359 0.7641 13.0899

When examining the dataset several issues arise, the first issue with the dataset is sparsity, which has led to a large number of unfilled variables. Particularly metal oxides that pertain to one glaze and not to another. Glazes often substitute various metal oxides for one another. The second issue with the dataset is labeling. We can treat the continuous variables for the chemical makeup of each glaze as ground truths because the chemical composition of raw materials doesn't vary between batches of materials, and can easily be calculated by the software. However, the physical descriptions and firing temperatures are labeled by the individuals uploading the recipes, this has likely resulted in some unaccounted variation related to opinions on firing, matte-ness, opacity, and even descriptions of color.

Data Cleaning

In [2549]:
# count the zero and null values and add them to a new dataframe 

nulls = glaze_df.isin([np.nan]).sum()
zeros = glaze_df.isin([0]).sum()

#create a dataframe to store the null values
null_df =  pd.DataFrame()
null_df['nulls'] = nulls
null_df['zeros'] = zeros

# sort dataframe 
null_df = null_df.sort_values(by=['nulls'],ascending=False )
null_df = null_df.T

#pd.set_option(optname, val)
null_df.head()

# to assist in viewing use the Heatmap below 
#plt.figure(figsize=(40,40))
#sns.heatmap(null_df)
#country_zeros.head()
Out[2549]:
Sm2O3_percent CeO2_percent Er2O3_percent Ho2O3_percent Dy2O3_percent Tb2O3_percent Eu2O3_percent IrO2_mol U3O8_percent Nd2O3_percent Pr2O3_percent PrO2_percent La2O3_percent Yb2O3_percent SeO2_percent Bi2O3_percent Sb2O3_percent As2O3_percent GeO2_percent Au2O3_percent Ag2O_percent PtO2_percent IrO2_percent OsO2_percent Tm2O3_percent Lu2O3_percent Ta2O5_percent Nd2O3_mol Lu2O3_mol Yb2O3_mol Tm2O3_mol Er2O3_mol Ho2O3_mol Dy2O3_mol Tb2O3_mol Eu2O3_mol Sm2O3_mol U3O8_mol Pr2O3_mol PtO2_mol PrO2_mol CeO2_mol La2O3_mol SeO2_mol Bi2O3_mol Sb2O3_mol As2O3_mol GeO2_mol Au2O3_mol Ag2O_mol WO3_percent MoO3_percent Sm2O3_percent_mol Ho2O3_percent_mol PrO2_percent_mol Pr2O3_percent_mol U3O8_percent_mol Nb2O5_percent Eu2O3_percent_mol Tb2O3_percent_mol Dy2O3_percent_mol Er2O3_percent_mol La2O3_percent_mol Tm2O3_percent_mol Yb2O3_percent_mol Lu2O3_percent_mol MoO3_mol Ta2O5_mol Nb2O5_mol HfO2_mol CeO2_percent_mol Nd2O3_percent_mol SeO2_percent_mol OsO2_percent_mol HfO2_percent OsO2_mol WO3_mol HfO2_percent_mol Bi2O3_percent_mol Ta2O5_percent_mol MoO3_percent_mol WO3_percent_mol Nb2O5_percent_mol IrO2_percent_mol PtO2_percent_mol Ag2O_percent_mol Au2O3_percent_mol GeO2_percent_mol Sb2O3_percent_mol As2O3_percent_mol transparency_type surface_type rgb_b rgb_g rgb_r from_orton_cone to_orton_cone material_state material_state_id CuO_mol CoO_mol FeO_mol NiO_mol Fe2O3_mol id MnO2_mol MnO_mol Lu2O3_xumf Yb2O3_xumf Tm2O3_xumf Er2O3_xumf Ho2O3_xumf Dy2O3_xumf CdO_mol Tb2O3_xumf Eu2O3_xumf Sm2O3_xumf U3O8_xumf Nd2O3_xumf Pr2O3_xumf PrO2_xumf CeO2_xumf SiO2_mol Al2O3_mol B2O3_mol BaO_mol Cr2O3_mol V2O5_mol F_mol P2O5_mol PbO_mol ZnO_mol SrO_mol Li2O_mol CaO_mol MgO_mol BeO_mol KNaO_mol Na2O_mol K2O_mol Cu2O_mol F_percent_mol TiO2_mol TiO2_percent_mol MnO2_percent_mol FeO_percent_mol Fe2O3_percent_mol CoO_percent_mol NiO_percent_mol CuO_percent_mol Cu2O_percent_mol CdO_percent_mol ZrO_percent_mol ZrO_mol ZrO2_percent_mol SnO2_percent_mol SiO2_Al2O3_ratio_umf R2O_umf RO_umf SiO2_Al2O3_ratio_xumf R2O_xumf RO_xumf MnO_percent_mol Cr2O3_percent_mol V2O5_percent_mol SeO2_xumf ZrO2_mol SnO2_mol SiO2_percent_mol Al2O3_percent_mol B2O3_percent_mol Li2O_percent_mol K2O_percent_mol Na2O_percent_mol KNaO_percent_mol BeO_percent_mol MgO_percent_mol CaO_percent_mol SrO_percent_mol BaO_percent_mol ZnO_percent_mol PbO_percent_mol P2O5_percent_mol La2O3_xumf Cu2O_xumf Bi2O3_xumf Nb2O5_umf ZnO_umf BaO_umf SrO_umf CaO_umf MgO_umf BeO_umf KNaO_umf Na2O_umf K2O_umf Li2O_umf B2O3_umf Al2O3_umf SiO2_umf SnO2_percent ZrO2_percent PbO_umf P2O5_umf F_umf CuO_umf SnO2_umf ZrO2_umf ZrO_umf TiO2_umf CdO_umf Cu2O_umf NiO_umf V2O5_umf CoO_umf Fe2O3_umf FeO_umf MnO2_umf MnO_umf Cr2O3_umf ZrO_percent TiO2_percent CdO_percent SiO2_percent KNaO_percent Na2O_percent K2O_percent Li2O_percent B2O3_percent Al2O3_percent is_private MgO_percent is_theoretical is_primitive is_analysis material_type material_type_id created_by_user_id BeO_percent CaO_percent Cu2O_percent MnO_percent CuO_percent NiO_percent CoO_percent Fe2O3_percent FeO_percent MnO2_percent Cr2O3_percent SrO_percent V2O5_percent F_percent P2O5_percent PbO_percent ZnO_percent BaO_percent HfO2_umf Ta2O5_umf Sb2O3_xumf MoO3_umf CuO_xumf NiO_xumf CoO_xumf Fe2O3_xumf FeO_xumf MnO2_xumf MnO_xumf Cr2O3_xumf V2O5_xumf F_xumf P2O5_xumf PbO_xumf ZnO_xumf BaO_xumf SrO_xumf name CdO_xumf TiO2_xumf OsO2_xumf As2O3_xumf GeO2_xumf Au2O3_xumf Ag2O_xumf PtO2_xumf IrO2_xumf WO3_xumf ZrO_xumf MoO3_xumf Ta2O5_xumf Nb2O5_xumf HfO2_xumf SnO2_xumf ZrO2_xumf CaO_xumf MgO_xumf BeO_xumf As2O3_umf PrO2_umf CeO2_umf La2O3_umf SeO2_umf Bi2O3_umf Sb2O3_umf GeO2_umf Nd2O3_umf Au2O3_umf Ag2O_umf PtO2_umf IrO2_umf OsO2_umf WO3_umf Pr2O3_umf U3O8_umf KNaO_xumf Lu2O3_umf Na2O_xumf K2O_xumf Li2O_xumf B2O3_xumf Al2O3_xumf SiO2_xumf Yb2O3_umf Sm2O3_umf Tm2O3_umf Er2O3_umf Ho2O3_umf Dy2O3_umf Tb2O3_umf Eu2O3_umf loi
nulls 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 6391 3701 2564 2500 2500 2500 220 180 92 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
zeros 1556 1556 1545 1548 1556 1556 1556 1556 1556 1545 1542 1555 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1545 1556 1556 1556 1545 1548 1556 1556 1556 1556 1556 1542 1556 1555 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1548 1555 1542 1556 1556 1556 1556 1556 1545 1556 1556 1556 1556 1556 1556 1556 1556 1556 1545 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 1556 0 0 946 940 871 0 0 0 0 6490 6692 7734 7835 436 0 7557 7714 7947 7947 7947 7936 7939 7947 7947 7947 7947 7947 7947 7936 7933 7946 7947 23 38 3791 7114 7486 7941 7912 3426 7898 6385 7172 6522 74 478 7947 68 292 197 7938 7912 1524 1389 7555 7734 407 6692 7835 6490 7938 7946 7947 7947 7219 6957 47 53 35 42 41 23 7698 7483 7941 7947 7219 6957 23 38 3791 6522 196 287 68 7947 470 73 7162 7113 6384 7898 3178 7947 7938 7947 7947 6384 7114 7163 73 472 7947 68 287 197 6522 3791 46 31 6957 7219 7898 3192 7912 6496 6957 7220 7947 1403 7947 7938 7837 7941 6699 416 7735 7563 7701 7485 7947 1389 7946 23 68 287 196 6522 3791 38 7947 470 7947 7946 7947 0 0 0 7947 73 7938 7698 6490 7835 6692 407 7734 7555 7483 7162 7941 7912 3178 7898 6384 7113 7947 7947 7947 7947 6490 7836 6692 414 7734 7555 7701 7484 7941 7912 3194 7898 6384 7114 7163 0 7947 1407 7947 7947 7947 7947 7947 7947 7947 7947 7947 7947 7947 7947 7947 6957 7220 73 472 7947 7947 7946 7947 7947 7947 7947 7947 7947 7936 7947 7947 7947 7947 7947 7947 7933 7947 68 7947 287 197 6522 3791 41 26 7947 7947 7947 7936 7939 7947 7947 7947 30
In [2550]:
glaze_df.describe()
Out[2550]:
id created_by_user_id material_type_id material_state_id rgb_r rgb_g rgb_b is_analysis is_primitive is_theoretical is_private SiO2_percent Al2O3_percent B2O3_percent Li2O_percent K2O_percent Na2O_percent KNaO_percent BeO_percent MgO_percent CaO_percent SrO_percent BaO_percent ZnO_percent PbO_percent P2O5_percent F_percent V2O5_percent Cr2O3_percent MnO_percent MnO2_percent FeO_percent Fe2O3_percent CoO_percent NiO_percent CuO_percent Cu2O_percent CdO_percent TiO2_percent ZrO_percent ZrO2_percent SnO2_percent HfO2_percent Nb2O5_percent Ta2O5_percent MoO3_percent WO3_percent OsO2_percent IrO2_percent PtO2_percent Ag2O_percent Au2O3_percent GeO2_percent As2O3_percent Sb2O3_percent Bi2O3_percent SeO2_percent La2O3_percent CeO2_percent PrO2_percent Pr2O3_percent Nd2O3_percent U3O8_percent Sm2O3_percent Eu2O3_percent Tb2O3_percent Dy2O3_percent Ho2O3_percent Er2O3_percent Tm2O3_percent Yb2O3_percent Lu2O3_percent SiO2_umf Al2O3_umf B2O3_umf Li2O_umf K2O_umf Na2O_umf KNaO_umf BeO_umf MgO_umf CaO_umf SrO_umf BaO_umf ZnO_umf PbO_umf P2O5_umf F_umf V2O5_umf Cr2O3_umf MnO_umf MnO2_umf FeO_umf Fe2O3_umf CoO_umf NiO_umf CuO_umf Cu2O_umf CdO_umf TiO2_umf ZrO_umf ZrO2_umf SnO2_umf HfO2_umf Nb2O5_umf Ta2O5_umf MoO3_umf WO3_umf OsO2_umf IrO2_umf PtO2_umf Ag2O_umf Au2O3_umf GeO2_umf As2O3_umf Sb2O3_umf Bi2O3_umf SeO2_umf La2O3_umf CeO2_umf PrO2_umf Pr2O3_umf Nd2O3_umf U3O8_umf Sm2O3_umf Eu2O3_umf Tb2O3_umf Dy2O3_umf Ho2O3_umf Er2O3_umf Tm2O3_umf Yb2O3_umf Lu2O3_umf SiO2_xumf Al2O3_xumf B2O3_xumf Li2O_xumf K2O_xumf Na2O_xumf KNaO_xumf BeO_xumf MgO_xumf CaO_xumf SrO_xumf BaO_xumf ZnO_xumf PbO_xumf P2O5_xumf F_xumf V2O5_xumf Cr2O3_xumf MnO_xumf MnO2_xumf FeO_xumf Fe2O3_xumf CoO_xumf NiO_xumf CuO_xumf Cu2O_xumf CdO_xumf TiO2_xumf ZrO_xumf ZrO2_xumf SnO2_xumf HfO2_xumf Nb2O5_xumf Ta2O5_xumf MoO3_xumf WO3_xumf OsO2_xumf IrO2_xumf PtO2_xumf Ag2O_xumf Au2O3_xumf GeO2_xumf As2O3_xumf Sb2O3_xumf Bi2O3_xumf SeO2_xumf La2O3_xumf CeO2_xumf PrO2_xumf Pr2O3_xumf Nd2O3_xumf U3O8_xumf Sm2O3_xumf Eu2O3_xumf Tb2O3_xumf Dy2O3_xumf Ho2O3_xumf Er2O3_xumf Tm2O3_xumf Yb2O3_xumf Lu2O3_xumf SiO2_mol Al2O3_mol B2O3_mol Li2O_mol K2O_mol Na2O_mol KNaO_mol BeO_mol MgO_mol CaO_mol SrO_mol BaO_mol ZnO_mol PbO_mol P2O5_mol F_mol V2O5_mol Cr2O3_mol MnO_mol MnO2_mol FeO_mol Fe2O3_mol CoO_mol NiO_mol CuO_mol Cu2O_mol CdO_mol TiO2_mol ZrO_mol ZrO2_mol SnO2_mol HfO2_mol Nb2O5_mol Ta2O5_mol MoO3_mol WO3_mol OsO2_mol IrO2_mol PtO2_mol Ag2O_mol Au2O3_mol GeO2_mol As2O3_mol Sb2O3_mol Bi2O3_mol SeO2_mol La2O3_mol CeO2_mol PrO2_mol Pr2O3_mol Nd2O3_mol U3O8_mol Sm2O3_mol Eu2O3_mol Tb2O3_mol Dy2O3_mol Ho2O3_mol Er2O3_mol Tm2O3_mol Yb2O3_mol Lu2O3_mol SiO2_percent_mol Al2O3_percent_mol B2O3_percent_mol Li2O_percent_mol K2O_percent_mol Na2O_percent_mol KNaO_percent_mol BeO_percent_mol MgO_percent_mol CaO_percent_mol SrO_percent_mol BaO_percent_mol ZnO_percent_mol PbO_percent_mol P2O5_percent_mol F_percent_mol V2O5_percent_mol Cr2O3_percent_mol MnO_percent_mol MnO2_percent_mol FeO_percent_mol Fe2O3_percent_mol CoO_percent_mol NiO_percent_mol CuO_percent_mol Cu2O_percent_mol CdO_percent_mol TiO2_percent_mol ZrO_percent_mol ZrO2_percent_mol SnO2_percent_mol HfO2_percent_mol Nb2O5_percent_mol Ta2O5_percent_mol MoO3_percent_mol WO3_percent_mol OsO2_percent_mol IrO2_percent_mol PtO2_percent_mol Ag2O_percent_mol Au2O3_percent_mol GeO2_percent_mol As2O3_percent_mol Sb2O3_percent_mol Bi2O3_percent_mol SeO2_percent_mol La2O3_percent_mol CeO2_percent_mol PrO2_percent_mol Pr2O3_percent_mol Nd2O3_percent_mol U3O8_percent_mol Sm2O3_percent_mol Eu2O3_percent_mol Tb2O3_percent_mol Dy2O3_percent_mol Ho2O3_percent_mol Er2O3_percent_mol Tm2O3_percent_mol Yb2O3_percent_mol Lu2O3_percent_mol SiO2_Al2O3_ratio_umf R2O_umf RO_umf SiO2_Al2O3_ratio_xumf R2O_xumf RO_xumf loi
count 7947.000000 7947.000000 7947.000000 7855.00000 5447.000000 5447.000000 5447.000000 7947.0 7947.000000 7947.0 7947.0 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.0 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7.947000e+03 7947.000000 7947.0 7947.000000 7947.000000 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.000000 1556.000000 1556.000000 1556.0 1556.0 1556.0 1556.0 1556.0 1556.000000 1556.000000 1556.0 1556.0 1556.0 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.0 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.0 7947.000000 7947.0 7947.000000 7947.000000 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.000000 7947.000000 7947.000000 7947.0 7947.0 7947.0 7947.0 7947.0 7947.000000 7947.000000 7947.0 7947.0 7947.0 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.0 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.0 7947.000000 7947.0 7947.000000 7947.000000 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.0 7947.000000 7947.000000 7947.000000 7947.0 7947.0 7947.0 7947.0 7947.0 7947.000000 7947.000000 7947.0 7947.0 7947.0 7947.000000 7947.000000 7947.000000 7947.00000 7947.000000 7947.000000 7947.000000 7947.0 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.0 7947.000000 7947.0 7947.000000 7947.000000 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.000000 1556.000000 1556.000000 1556.0 1556.0 1556.0 1556.0 1556.0 1556.000000 1556.000000 1556.0 1556.0 1556.0 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.0 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7.947000e+03 7947.000000 7947.0 7947.000000 7947.000000 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.0 1556.000000 1556.000000 1556.000000 1556.0 1556.0 1556.0 1556.0 1556.0 1556.000000 1556.000000 1556.0 1556.0 1556.0 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000 7947.000000
mean 10926.789480 1186.876935 626.861457 1.96690 174.674867 169.390490 163.965486 0.0 0.000126 0.0 0.0 50.537434 12.294534 2.896148 0.413563 2.854380 2.965174 5.819554 0.0 1.584117 7.834801 0.624743 1.039245 1.367178 0.217261 0.342946 0.051990 0.002829 0.070849 0.034398 0.375453 0.020815 1.721809 0.197875 0.028177 0.415266 0.024147 1.510004e-07 0.979571 0.0 0.561547 0.508008 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000417 0.053542 0.035526 0.0 0.0 0.0 0.0 0.0 0.026465 0.037993 0.0 0.0 0.0 3.765564 0.679983 0.140338 0.038042 0.120204 0.173495 0.293700 0.0 0.125640 0.450476 0.018834 0.021478 0.044052 0.003165 0.006426 0.005789 0.000042 0.002178 0.001593 0.094102 0.001809 0.071528 0.017612 0.000817 0.055136 0.002896 0.0 0.083471 0.0 0.019410 0.012734 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000002 0.000129 0.000081 0.0 0.0 0.0 0.0 0.0 0.000053 0.000076 0.0 0.0 0.0 3.344119 0.589734 0.129333 0.035859 0.108547 0.160845 0.269391 0.0 0.114559 0.419333 0.017612 0.020176 0.042220 0.002945 0.005714 0.005676 0.000042 0.001688 0.001279 0.008897 0.000919 0.035824 0.007361 0.000725 0.013172 0.001446 0.0 0.045084 0.0 0.018472 0.009359 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000002 0.000129 0.000080 0.0 0.0 0.0 0.0 0.0 0.000052 0.000076 0.0 0.0 0.0 0.841280 0.120662 0.041599 0.01384 0.030307 0.047843 0.078149 0.0 0.039306 0.139716 0.006029 0.006778 0.016798 0.000973 0.002416 0.002737 0.000016 0.000466 0.000485 0.004319 0.000290 0.010783 0.002640 0.000377 0.005220 0.000169 0.0 0.012266 0.0 0.004557 0.003371 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000002 0.000162 0.000105 0.0 0.0 0.0 0.0 0.0 0.000070 0.000099 0.0 0.0 0.0 61.640766 8.953577 3.073338 1.030336 2.245048 3.485366 5.730414 0.0 2.896297 10.281062 0.477958 0.584692 1.245892 0.114637 0.177292 0.121982 0.001152 0.038199 0.038908 0.367872 0.022024 0.834808 0.206688 0.034145 0.421587 0.019736 8.808355e-08 0.921603 0.0 0.345582 0.262204 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000159 0.012087 0.007591 0.0 0.0 0.0 0.0 0.0 0.004863 0.007207 0.0 0.0 0.0 8.640147 0.331742 0.665238 8.641403 0.327782 0.670205 9.338430
std 11342.247819 1907.512917 219.814429 0.54413 101.602293 104.179402 106.187538 0.0 0.011218 0.0 0.0 10.711831 5.483551 4.620330 1.701140 2.471711 2.575873 3.143375 0.0 2.355657 4.962142 2.794981 4.306265 4.110626 3.142587 1.514782 1.097518 0.111406 0.673411 0.634422 3.293934 0.236725 3.532058 0.918364 0.734213 2.000550 1.250807 1.346107e-05 2.408368 0.0 2.131954 2.002780 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.016458 0.642753 0.465724 0.0 0.0 0.0 0.0 0.0 0.409383 0.491818 0.0 0.0 0.0 5.105971 2.976901 0.223816 0.112147 0.121218 0.154631 0.190999 0.0 0.156355 0.244341 0.084052 0.088488 0.114805 0.046689 0.023389 0.119374 0.001655 0.030143 0.023038 2.322766 0.067194 1.598496 0.325774 0.010702 1.380930 0.174646 0.0 2.302798 0.0 0.108305 0.108922 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000153 0.003615 0.002413 0.0 0.0 0.0 0.0 0.0 0.001864 0.002240 0.0 0.0 0.0 3.787613 2.209551 0.205162 0.106158 0.107986 0.146826 0.178491 0.0 0.145762 0.234768 0.078288 0.083158 0.111045 0.043608 0.021129 0.117612 0.001640 0.022140 0.019231 0.064713 0.010428 0.067411 0.026967 0.007398 0.044507 0.084797 0.0 0.127225 0.0 0.101495 0.032245 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000153 0.003606 0.002408 0.0 0.0 0.0 0.0 0.0 0.001860 0.002235 0.0 0.0 0.0 0.178958 0.054614 0.066365 0.05693 0.026240 0.041560 0.043874 0.0 0.058446 0.088487 0.026973 0.028086 0.050505 0.014080 0.010672 0.057770 0.000612 0.004431 0.008944 0.037889 0.003295 0.022118 0.012255 0.009830 0.025149 0.008741 0.0 0.030154 0.0 0.017302 0.013288 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000096 0.001950 0.001382 0.0 0.0 0.0 0.0 0.0 0.001084 0.001287 0.0 0.0 0.0 10.920354 4.270605 5.142859 4.328130 1.950433 2.943432 3.129093 0.0 4.336101 6.668473 2.202455 2.565168 3.779264 1.827177 0.756258 2.322429 0.045253 0.412455 0.779688 3.400356 0.269744 1.990517 1.168084 1.143197 2.356232 1.079354 7.852292e-06 2.403438 0.0 1.418373 1.243556 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.006274 0.148235 0.100065 0.0 0.0 0.0 0.0 0.0 0.075759 0.094122 0.0 0.0 0.0 11.235842 0.216101 0.218419 11.234988 0.206560 0.208222 5.289744
min 1.000000 1.000000 110.000000 1.00000 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -6.386300
25% 2446.500000 1.000000 470.000000 2.00000 79.000000 54.000000 41.000000 0.0 0.000000 0.0 0.0 46.069950 9.210800 0.000000 0.000000 1.166800 1.199100 3.843000 0.0 0.055100 4.704400 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.119300 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.015000 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 2.251450 0.283600 0.000000 0.000000 0.039900 0.071100 0.179250 0.0 0.004600 0.277600 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.002300 0.000000 0.000000 0.000000 0.000000 0.0 0.000600 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 2.082000 0.262350 0.000000 0.000000 0.036800 0.064100 0.162550 0.0 0.004300 0.255350 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.002200 0.000000 0.000000 0.000000 0.000000 0.0 0.000600 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.766750 0.090300 0.000000 0.00000 0.012400 0.019350 0.051900 0.0 0.001400 0.083900 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000700 0.000000 0.000000 0.000000 0.000000 0.0 0.000200 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 57.789350 6.579850 0.000000 0.000000 0.931150 1.448000 3.791900 0.0 0.098750 6.125000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.055100 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.013900 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 5.407850 0.197600 0.603650 5.407850 0.196900 0.610100 5.990350
50% 5319.000000 7.000000 540.000000 2.00000 252.000000 241.000000 232.000000 0.0 0.000000 0.0 0.0 52.248500 11.554600 0.744400 0.000000 2.290900 2.382500 5.402500 0.0 0.563200 8.136900 0.000000 0.000000 0.000000 0.000000 0.014600 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.251100 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.072400 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 2.925200 0.384100 0.033600 0.000000 0.090100 0.131700 0.241200 0.0 0.060900 0.473000 0.000000 0.000000 0.000000 0.000000 0.000400 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.005300 0.000000 0.000000 0.000000 0.000000 0.0 0.003000 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 2.701700 0.355900 0.030400 0.000000 0.082200 0.121200 0.225600 0.0 0.053700 0.436700 0.000000 0.000000 0.000000 0.000000 0.000300 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.005100 0.000000 0.000000 0.000000 0.000000 0.0 0.002800 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.869700 0.113300 0.010700 0.00000 0.024300 0.038400 0.070800 0.0 0.014000 0.145100 0.000000 0.000000 0.000000 0.000000 0.000100 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.001600 0.000000 0.000000 0.000000 0.000000 0.0 0.000900 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 63.252800 8.136900 0.804100 0.000000 1.815500 2.821400 5.195100 0.0 1.061600 10.512900 0.000000 0.000000 0.000000 0.000000 0.007500 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.115400 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.064700 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 7.646000 0.265500 0.733500 7.646000 0.270400 0.729100 8.991500
75% 19343.500000 1599.000000 760.000000 2.00000 255.000000 255.000000 255.000000 0.0 0.000000 0.0 0.0 57.382100 14.547400 4.130800 0.000000 3.808750 4.159000 7.097650 0.0 2.419600 10.799400 0.000000 0.000000 0.000000 0.000000 0.047700 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.464150 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.340000 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 3.669950 0.492400 0.201500 0.000000 0.158800 0.230950 0.346000 0.0 0.203450 0.638700 0.000000 0.000000 0.000000 0.000000 0.001200 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.039550 0.000000 0.000000 0.000000 0.000000 0.0 0.019600 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 3.412900 0.455650 0.186150 0.000000 0.146250 0.216050 0.316600 0.0 0.182400 0.594150 0.000000 0.000000 0.000000 0.000000 0.001100 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.036400 0.000000 0.000000 0.000000 0.000000 0.0 0.016150 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.955000 0.142700 0.059300 0.00000 0.040500 0.067100 0.094100 0.0 0.060050 0.192600 0.000000 0.000000 0.000000 0.000000 0.000300 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.009200 0.000000 0.000000 0.000000 0.000000 0.0 0.004300 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 68.098650 10.751150 4.229400 0.000000 2.974150 4.931150 6.957150 0.0 4.414550 13.962750 0.000000 0.000000 0.000000 0.000000 0.024000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.693750 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.319700 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 9.598700 0.392300 0.802400 9.598700 0.388950 0.803050 11.849800
max 39478.000000 9728.000000 1180.000000 3.00000 255.000000 255.000000 255.000000 0.0 1.000000 0.0 0.0 127.886700 77.606000 40.640000 32.592000 22.067900 29.427100 29.842700 0.0 25.818700 51.850300 32.054800 41.577700 80.000000 74.212600 33.911000 35.567700 5.509600 41.811800 36.972000 75.225100 7.821700 76.080100 44.100000 62.930000 77.669900 78.260000 1.200000e-03 45.051200 0.0 50.000000 55.555600 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.649200 13.043500 9.090900 0.0 0.0 0.0 0.0 0.0 9.090900 9.090900 0.0 0.0 0.0 169.793100 197.709800 2.188100 1.000000 1.000000 1.000000 1.000000 0.0 0.993300 1.000000 0.841200 0.848800 0.907400 1.000000 0.693700 3.624800 0.083900 1.694500 0.963200 139.189700 5.881500 140.208300 22.489300 0.679500 105.202100 11.006100 0.0 198.876900 0.0 3.466300 7.824800 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.013600 0.176100 0.108900 0.0 0.0 0.0 0.0 0.0 0.097000 0.095800 0.0 0.0 0.0 108.008800 136.473400 1.999500 1.000000 1.000000 1.000000 1.000000 0.0 0.966400 1.000000 0.776900 0.817000 0.907000 1.000000 0.654900 3.624800 0.083600 1.653800 0.770800 0.905700 0.308500 0.992900 0.853800 0.225500 0.988700 5.391100 0.0 4.466000 0.0 3.376900 0.648500 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.013600 0.175400 0.108800 0.0 0.0 0.0 0.0 0.0 0.096800 0.095700 0.0 0.0 0.0 2.231600 1.000000 0.583700 1.09070 0.234300 0.474800 0.479200 0.0 0.640600 0.924600 0.309300 0.271200 0.982900 0.332500 0.238900 1.872200 0.030300 0.275100 0.521200 0.865300 0.108900 0.476400 0.588500 0.842500 0.976400 0.546900 0.0 0.564100 0.0 0.405800 0.368600 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.003800 0.039500 0.027000 0.0 0.0 0.0 0.0 0.0 0.024100 0.023800 0.0 0.0 0.0 87.383700 66.330400 52.194100 80.904400 18.253700 38.692100 40.072400 0.0 49.251100 90.072200 28.940600 28.293400 81.076600 54.418500 14.252200 65.176300 2.348700 25.628900 50.153600 86.002300 12.250200 67.545800 62.400300 100.000000 82.038400 67.802500 7.000000e-04 48.295500 0.0 50.765600 48.388500 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.247500 3.189600 1.916100 0.0 0.0 0.0 0.0 0.0 1.709900 1.690600 0.0 0.0 0.0 423.443500 1.000000 1.000000 423.443500 1.000000 1.000000 50.383000
In [2551]:
#A lot of the columns contain all zero values 
#Remove empty variables 
column_list = []
column_list = glaze_df.columns.to_list()
columns_to_drop=[]
for column in column_list:
    if glaze_df[column].dtypes == float:
        if glaze_df[column].max()==0:
            columns_to_drop.append(column)
        else:
            None
    elif glaze_df[column].dtypes == int:
        if glaze_df[column].max()==0:
            columns_to_drop.append(column)
        else:
            None
    else:
        None
        
glaze_df = glaze_df.drop(columns = columns_to_drop)
In [2552]:
#remove non Glaze items from the data set
print('# of entries before removal:{} '.format(len(glaze_df)))
materials_to_drop = []
materials_to_drop = ['Porcelain','Hand-building','Terra Sigillata','Slip & Engobe','Sculpture', 'Throwing',
       'Clay Body','Engobe','Slipcasting','Slip','Underglaze','Stoneware','Refractory',
        'Stain','Slip-Based']
for material in materials_to_drop: 
    glaze_df = glaze_df[glaze_df.material_type != material]

    #'Salt & Soda', 'Macro','Overglaze'
# remove recipies that fall below cone 06    
cones_to_drop=['07','08''09','010','011','012','014','015', '016','018','022']
 
for cone in cones_to_drop: 
    glaze_df = glaze_df[glaze_df.to_orton_cone != cone]
    glaze_df = glaze_df[glaze_df.from_orton_cone != cone]

#drop the recipes that dont have a temperature listed 
glaze_df=glaze_df.dropna(subset=['to_orton_cone','from_orton_cone','surface_type'])    

print('# of entries after removal:{} '.format(len(glaze_df)))
# of entries before removal:7947 
# of entries after removal:5154 
In [2553]:
# count the zero and null values and add them to a new dataframe 

nulls = glaze_df.isin([np.nan]).sum()
zeros = glaze_df.isin([0]).sum()

#create a dataframe to store the null values
null_df =  pd.DataFrame()
null_df['nulls'] = nulls
null_df['zeros'] = zeros

# sort dataframe 
null_df = null_df.sort_values(by=['nulls'],ascending=False )
null_df = null_df.T

#pd.set_option(optname, val)
null_df.head()

# to assist in viewing use the Heatmap below 
#plt.figure(figsize=(40,40))
#sns.heatmap(null_df)
#country_zeros.head()
Out[2553]:
Pr2O3_percent_mol Ho2O3_mol Nd2O3_percent Ho2O3_percent Er2O3_percent PrO2_percent_mol Nd2O3_percent_mol Ho2O3_percent_mol Er2O3_percent_mol PrO2_percent Er2O3_mol Nd2O3_mol Pr2O3_mol PrO2_mol Pr2O3_percent transparency_type rgb_b rgb_g rgb_r material_state material_state_id P2O5_mol F_mol V2O5_mol Cr2O3_mol MnO_mol PbO_mol id MnO2_mol FeO_mol Fe2O3_mol CoO_mol BaO_mol NiO_mol ZnO_mol K2O_mol SrO_mol CaO_mol Fe2O3_xumf CoO_xumf NiO_xumf CuO_xumf Cu2O_xumf TiO2_xumf ZrO2_xumf SnO2_xumf PrO2_xumf Pr2O3_xumf Nd2O3_xumf Ho2O3_xumf Er2O3_xumf SiO2_mol Al2O3_mol B2O3_mol Li2O_mol Cu2O_mol Na2O_mol KNaO_mol MgO_mol CuO_mol B2O3_percent_mol TiO2_mol TiO2_percent_mol FeO_percent_mol Fe2O3_percent_mol CoO_percent_mol NiO_percent_mol CuO_percent_mol Cu2O_percent_mol CdO_percent_mol ZrO2_percent_mol MnO_percent_mol SnO2_percent_mol SiO2_Al2O3_ratio_umf R2O_umf RO_umf SiO2_Al2O3_ratio_xumf R2O_xumf RO_xumf MnO2_percent_mol Cr2O3_percent_mol ZrO2_mol KNaO_percent_mol SnO2_mol SiO2_percent_mol Al2O3_percent_mol MnO2_xumf Li2O_percent_mol K2O_percent_mol Na2O_percent_mol MgO_percent_mol V2O5_percent_mol CaO_percent_mol SrO_percent_mol BaO_percent_mol ZnO_percent_mol PbO_percent_mol P2O5_percent_mol F_percent_mol FeO_xumf PbO_xumf MnO_xumf Cr2O3_xumf V2O5_percent Cr2O3_percent MnO_percent MnO2_percent FeO_percent Fe2O3_percent CoO_percent NiO_percent CuO_percent Cu2O_percent CdO_percent TiO2_percent ZrO2_percent SnO2_percent SiO2_umf Al2O3_umf B2O3_umf Li2O_umf K2O_umf F_percent P2O5_percent PbO_percent Al2O3_percent created_by_user_id material_type_id material_type surface_type from_orton_cone to_orton_cone is_primitive SiO2_percent B2O3_percent ZnO_percent Li2O_percent K2O_percent Na2O_percent KNaO_percent MgO_percent CaO_percent SrO_percent BaO_percent Na2O_umf KNaO_umf MgO_umf KNaO_xumf Ho2O3_umf Er2O3_umf SiO2_xumf Al2O3_xumf B2O3_xumf Li2O_xumf K2O_xumf Na2O_xumf MgO_xumf Pr2O3_umf CaO_xumf SrO_xumf BaO_xumf ZnO_xumf name P2O5_xumf F_xumf V2O5_xumf Nd2O3_umf PrO2_umf CaO_umf MnO_umf SrO_umf BaO_umf ZnO_umf PbO_umf P2O5_umf F_umf V2O5_umf Cr2O3_umf MnO2_umf SnO2_umf FeO_umf Fe2O3_umf CoO_umf NiO_umf CuO_umf Cu2O_umf TiO2_umf ZrO2_umf loi
nulls 4350 4350 4350 4350 4350 4350 4350 4350 4350 4350 4350 4350 4350 4350 4350 1470 1294 1294 1294 31 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
zeros 792 796 796 796 795 803 796 796 795 803 795 796 792 803 792 0 614 628 617 0 0 2230 5133 5153 4819 5022 5133 0 4906 5018 278 4211 4635 5065 4074 105 4547 33 259 4211 5065 4148 5150 956 4699 4466 5153 5142 5146 5146 5145 4 8 2225 4231 5150 191 28 295 4148 2225 1043 947 5018 254 4211 5065 4148 5150 5153 4699 5011 4466 14 19 8 9 12 1 4905 4818 4699 28 4466 4 8 4905 4231 105 191 292 5153 33 4542 4634 4073 5133 2063 5133 5018 5133 5014 4818 5153 4818 5011 4905 5018 254 4211 5065 4148 5150 5153 947 4699 4466 9 13 2225 4231 105 5133 2063 5133 8 0 0 0 0 0 0 5154 4 2225 4073 4231 105 191 28 292 33 4542 4634 191 28 293 28 5146 5145 4 8 2225 4231 105 191 293 5142 33 4542 4635 4073 0 2071 5133 5153 5146 5153 33 5014 4542 4635 4073 5133 2069 5133 5153 4818 4910 4466 5018 258 4215 5065 4153 5150 956 4699 6
In [2554]:
glaze_df.from_orton_cone.value_counts()
Out[2554]:
6            1969
10            796
5             697
9             649
04            288
8             168
4             131
5 ½      106
7              77
05             47
06             43
1              42
03             40
08             38
05 ½      23
2              14
3               9
11              7
02              5
13              2
01              2
12              1
Name: from_orton_cone, dtype: int64
In [2555]:
# clean the Cone Data and create  new columns that are in Degrees C
glaze_df['to_orton_cone']=glaze_df['to_orton_cone'].str.replace('05 ½','05')
glaze_df['to_orton_cone']=glaze_df['to_orton_cone'].str.replace('5 ½','5')
glaze_df['from_orton_cone']=glaze_df['from_orton_cone'].str.replace('05 ½','05')
glaze_df['from_orton_cone']=glaze_df['from_orton_cone'].str.replace('5 ½','5')

# create a dictionary for converting the pyrometric cone # to Degrees Centigrad
Cone_to_C = {}
Cone_to_C = {'022':586,'019':656,'018':686,'017':705,'016':742,'015':750,
              '014':757,'013':807,'012':843,'011':891,
              '010':891,'09':907,'08':922,'07':962,'06':981,'05':1021,
              '04':1046,'03':1071,'02':1078,'01':1093,
              '1':1109,'2':1112,'3':1115,'4':1141,'5':1159,
              '6':1185,'7':1201,'8':1211,'9':1224,'10':1251,
              '11':1272,'12':1285,'13':1310,'14':1351,
              'nan':np.nan,np.nan:np.nan}

temp_catagory = {'022':'Low','019':'Low','018':'Low','017':'Low','016':'Low','015':'Low',
              '014':'Low','013':'Low','012':'Low','011':'Low',
              '010':'Low','09':'Low','08':'Low','07':'Low','06':'Low','05':'Low',
              '04':'Low','03':'Low','02':'Low','01':'Low',
              '1':'Low','2':"Low",'3':"Mid",'4':"Mid",'5':"Mid",
              '6':"Mid",'7':"Mid",'8':"Mid",'9':'High','10':'High',
              '11':'High','12':'High','13':'High','14':1351,
              'nan':np.nan,np.nan:np.nan}
 
glaze_df['from_Degrees_C']=glaze_df['from_orton_cone'].apply(lambda x: Cone_to_C[x])
glaze_df['to_Degrees_C']=glaze_df['to_orton_cone'].apply(lambda x: Cone_to_C[x])

glaze_df['temp_catagory']=glaze_df['from_orton_cone'].apply(lambda x: temp_catagory[x])

              
In [2556]:
glaze_df['sum_color']=glaze_df['rgb_g']+glaze_df['rgb_b']+glaze_df['rgb_r']
In [2557]:
#create a new ordinal variable that takes the surface type and creats an ordinal numerical variable from it  

#create a dictionary to translate the ordinal variable
sheen_value_dict1={}
sheen_value_dict1 = {'Glossy'        :1,
                    'Glossy - Semi' :2,
                    'Satin'         :3,
                    'Satin - Matte' :4,
                    'Matte - Semi'  :5,
                    'Matte - Smooth':6,
                    'Matte'         :7,
                    'Matte - Stony' :8,
                    'Matte - Dry'   :9,
                     np.nan: np.nan}
is_glossy={}
is_glossy = {'Glossy'        :1,
                    'Glossy - Semi' :0,
                    'Satin'         :0,
                    'Satin - Matte' :0,
                    'Matte - Semi'  :0,
                    'Matte - Smooth':0,
                    'Matte'         :0,
                    'Matte - Stony' :0,
                    'Matte - Dry'   :0,
                     'nan':np.nan,
                     np.nan: np.nan}
sheens = []
sheens = glaze_df.surface_type.to_list()
sheen_value=[]

glaze_df['sheen_value']=glaze_df['surface_type'].apply(lambda x: sheen_value_dict1[x])

glaze_df['is_glossy']=glaze_df['surface_type'].apply(lambda x: is_glossy[x])

glaze_df['sheen_value'].head()
Out[2557]:
0    1
2    1
3    1
4    4
5    1
Name: sheen_value, dtype: int64
In [2558]:
#transparency_type
opacity_value_dict1={}
opacity_value_dict1 = {'Opaque':1,
                    'Translucent':2,
                    'Semi-opaque' :2,
                    'Transparent'  :3,
                     np.nan: np.nan}
glaze_df['opacity_value']=glaze_df['transparency_type'].apply(lambda x: opacity_value_dict1[x])

glaze_df.transparency_type.value_counts()
Out[2558]:
Opaque         1655
Transparent     725
Semi-opaque     653
Translucent     651
Name: transparency_type, dtype: int64
In [2559]:
#since Metal oxides that are not considered fluxes are used intermittantly in very low amounts
#they will be included by sumating their UMFs for each row,
#this will preven them from having to large of an effect on clustering or dimensionality reduction 
# but might help account for increased variance

glaze_df['colorant_oxide_sum']=glaze_df[['F_umf',
                        'V2O5_umf','Cr2O3_umf','MnO_umf',
                        'MnO2_umf','FeO_umf','Fe2O3_umf',
                        'CoO_umf','NiO_umf','CuO_umf',
                        'Cu2O_umf','PrO2_umf']].fillna(0).sum(axis=1)

glaze_df['opacifier_sum'] = glaze_df[['TiO2_umf','ZrO2_umf','P2O5_umf',
                                     'SnO2_umf']].fillna(0).sum(axis=1)

Data Exploration

In [2560]:
percent_df=pd.DataFrame()
percent_df = glaze_df[['SiO2_umf','Al2O3_umf','B2O3_umf',
                        'Li2O_umf','K2O_umf','Na2O_umf',
                        'KNaO_umf','MgO_umf','CaO_umf',
                        'SrO_umf','BaO_umf','ZnO_umf',
                        'PbO_umf','P2O5_umf','F_umf',
                        'V2O5_umf','Cr2O3_umf','MnO_umf',
                        'MnO2_umf','FeO_umf','Fe2O3_umf',
                        'CoO_umf','NiO_umf','CuO_umf',
                        'Cu2O_umf','TiO2_umf','ZrO2_umf',
                        'SnO2_umf','PrO2_umf']].mean()
percent_df.head()
plt.figure(figsize=(20,10))
percent_df.plot.bar(color=['g','gold'])
plt.xlabel('Ceramic Oxide',fontsize = 'large')
plt.ylabel('Average UMF',fontsize = 'large')
plt.title('Average Unity Molecular Formula for each ceramic oxide present', fontsize = 'xx-large')
plt.show()
In [2561]:
#test to see what metal oxides are included in RO and R2O

glaze_df['R2O_umf_sum']=glaze_df['Li2O_umf']+glaze_df['K2O_umf']+glaze_df['Na2O_umf']

glaze_df['RO_umf_sum']=glaze_df['MgO_umf']+glaze_df['CaO_umf']+glaze_df['SrO_umf']+glaze_df['BaO_umf']+glaze_df['ZnO_umf']+glaze_df['PbO_umf']+glaze_df['MnO_umf']
#glaze_df['P2O5_umf']
##+glaze_df['KNaO_umf']
#+glaze_df['PbO_umf']#+
plt.figure(figsize=(20,20))
plt.subplot(2,2,1)
plt.scatter(glaze_df.R2O_umf_sum,
            glaze_df.R2O_umf,
            color = 'orange')
plt.xlabel('R2O_umf')
plt.ylabel('sumation of R2O oxides umf')
plt.title('R2O_umf')

plt.subplot(2,2,2)
plt.scatter(glaze_df.RO_umf_sum,glaze_df.RO_umf)
plt.ylabel('sumation of RO oxides umf')
plt.xlabel('RO_umf')
plt.title('RO_umf')


plt.subplot(2,1,2)
glaze_df[['R2O_umf','RO_umf' ]].describe()
plt.hist(glaze_df['RO_umf' ],bins = 40,alpha = 0.5)
plt.hist(glaze_df['R2O_umf'],bins = 40,alpha = 0.5)
plt.title('Histogram of R20 and RO')
plt.show()
#glaze_df[['R2O_umf','RO_umf' ]].describe()

In the diagram above the histogram of R2O and RO can be seen. In a Unity Molecular Formula the sum of the fluxes R2O and RO are set equal to one, this explains the symmetry in the histograms seen above. All other materials are taken as a ratio of the fluxes in moles. The scatter plots above specify which metal oxides are being sumated to form R2O and RO fluxes. RO fluxes include oxides with : Magnesium, Calcium, Strontium, Barium, Zinc, Lead, and Manganese. R2O fluxes include oxides with: Lithium, Sodium, Potassium.

In [2562]:
plt.figure(figsize=(15,8))

xvar4 = glaze_df['RO_umf']
yvar4 = glaze_df['SiO2_Al2O3_ratio_umf']
cmap2 = plt.cm.viridis
norm2 = matplotlib.colors.Normalize(vmin=glaze_df.sheen_value.min(), vmax=glaze_df.sheen_value.max())
c2 = cmap2(norm2(glaze_df.sheen_value.values))

plt.scatter(xvar4,yvar4,c=glaze_df.sheen_value,cmap='viridis', alpha = 0.3)
plt.ylabel('SiO2_Al2O3_ratio_umf cone 6',fontsize = 'large')
plt.xlabel('RO_umf',fontsize = 'large')
plt.title('Plot of RO UMF and Silica to Alumina Ratio Hued to Surface Gloss',fontsize = 'xx-large')

sm2 = plt.cm.ScalarMappable(cmap=cmap2, norm=norm2)
plt.colorbar(sm2).set_label('sheen 1=glossy 10= matte', rotation=90,fontsize = 'large')
plt.ylim(0,40)

plt.show()
In [2563]:
plt.figure(figsize=(15,8))

xvar4 = glaze_df['B2O3_umf']
yvar4 = glaze_df['RO_umf']
cmap2 = plt.cm.plasma
norm2 = matplotlib.colors.Normalize(vmin=glaze_df.to_Degrees_C.min(), vmax=glaze_df.to_Degrees_C.max())
c2 = cmap2(norm2(glaze_df.to_Degrees_C.values))

plt.scatter(xvar4,yvar4,c=glaze_df.to_Degrees_C,cmap='plasma', alpha = 0.3)
plt.ylabel('Alkaline Earth Oxide UMF',fontsize = 'large')
plt.xlabel('UMF of Boron Oxide',fontsize = 'large')
plt.title('Plot of RO and Boron UMF Hued to Match Firing Temperature',fontsize = 'xx-large')

sm2 = plt.cm.ScalarMappable(cmap=cmap2, norm=norm2)
plt.colorbar(sm2).set_label('Maximum firing tepmerature 900C to 1350C ', rotation=90,fontsize = 'large')
plt.ylim(0,1)
plt.xlim(0.01,1)

plt.show()
In [2564]:
plt.figure(figsize=(15,8))

xvar4 = glaze_df['opacifier_sum']
yvar4 = glaze_df['SiO2_Al2O3_ratio_umf']
cmap2 = plt.cm.winter
norm2 = matplotlib.colors.Normalize(vmin=glaze_df.opacity_value.min(), vmax=glaze_df.opacity_value.max())
c2 = cmap2(norm2(glaze_df.opacity_value.values))

plt.scatter(xvar4,yvar4,c=glaze_df.sum_color,cmap='winter', alpha = 0.3)
plt.ylabel('SiO2_Al2O3_ratio_umf cone 6',fontsize = 'large')
plt.xlabel('Opacifiing Oxide',fontsize = 'large')
plt.title('Silica to Alumina Ratio versus Opacifying Oxide',fontsize = 'xx-large')

sm2 = plt.cm.ScalarMappable(cmap=cmap2, norm=norm2)
plt.colorbar(sm2).set_label('opacity_value 1= Transparent 3=Opaque', rotation=90,fontsize = 'large')
plt.ylim(0,20)
plt.xlim(0.01,2)

plt.show()
In [2565]:
plt.figure(figsize=(15,8))
cmap2 = plt.cm.cividis
norm2 = matplotlib.colors.Normalize(vmin=glaze_df.sum_color.min(), vmax=glaze_df.sum_color.max())
c2 = cmap2(norm2(glaze_df.sum_color.values))



xvar3 = glaze_df['colorant_oxide_sum']
yvar3 = glaze_df['SiO2_Al2O3_ratio_umf']
plt.scatter(xvar3,yvar3,
            c=glaze_df['sum_color']
            ,cmap='cividis',s=10, alpha = 0.8)
plt.ylabel('Sum of the Colorant Oxides',fontsize = 'large')
plt.xlabel('Silica to Alumina ratio UMF Hued to match ',fontsize = 'large')
plt.title('Plot of Colorant Oxides UMF to Silica Alumina Ratio',fontsize = 'xx-large')
plt.ylim(0,20)
plt.xlim(-0.1,2)


sm2 = plt.cm.ScalarMappable(cmap=cmap2, norm=norm2)
plt.colorbar(sm2).set_label('sum of RGB Values 0= black 765=white', rotation=90,fontsize = 'large')
plt.ylim(0,20)
plt.xlim(0.01,2)

plt.show
Out[2565]:
<function matplotlib.pyplot.show(*args, **kw)>
In [ ]:
 
In [2566]:
# select the features and standardize them 

X = glaze_df[[  
    'to_Degrees_C','sheen_value',
                  'sum_color',
                    'SiO2_umf',
                    'Al2O3_umf',
                    'B2O3_umf', 
                    'SiO2_Al2O3_ratio_umf',
                    'R2O_umf',
                    'RO_umf','colorant_oxide_sum','opacifier_sum']].fillna(0)

scaler = StandardScaler()
X_std = scaler.fit_transform(X)


#'P2O5_umf',colorant_oxide_sum','opacifier_sum' from_Degrees_C 'sheen_value',
#                  'sum_color',
                #'is_glossy',

Conduct UMAP for cluster visualization purposes

In [2567]:
# Dimensionality reduction 
#UMAP
umap_results = umap.UMAP(n_neighbors=15,
                      min_dist=0.1,
                      #metric='correlation',
                       #metric='cosine',
                      metric='euclidean',
                        n_components=2,
                        random_state=10
                        ).fit_transform(X_std)





#print('UMAP done! Time elapsed: {} seconds'.format(time.time()-time_start))
In [2568]:
#adding the UMAP Projections to the dataframe 
for n in range(2):
    glaze_df['UMAP{}'.format(n+1)]= umap_results[:, n]


#glaze_df[['UMAP1','UMAP2']]
In [2569]:
hue_list=[['sheen_value','viridis'],
          ['to_Degrees_C','plasma'],
          ['sum_color','rainbow'],
          ['opacity_value','cool']]
plt.figure(figsize=(30,20))
sns.set(rc={ 'figure.facecolor':'white'})

for i,hue in enumerate(hue_list):
    plt.subplot(2,2,i+1)
    plt.title('UMAP')
    plt.scatter(glaze_df['UMAP1'],  
                glaze_df['UMAP2'],
                c=glaze_df[hue[0]] ,
                s=10,
                cmap=hue[1],
                alpha = .5)
    plt.ylabel('UMAP2')
    plt.xlabel('UMAP1')
    plt.title('UMAP Projection hued by {}'.format(hue[0]))
In [2570]:
umap_std = scaler.fit_transform(umap_results)
for n in range(2):
    glaze_df['UMAP_std{}'.format(n+1)]= umap_std[:, n]

Split Data for Cluster Evaluation

In [2571]:
#split the dataset to compare clustering techniques
X_std1, X_std2, glaze_df_half1, glaze_df_half2,umap_std1,umap_std2 = train_test_split(
    X_std,
    glaze_df,umap_std,
    test_size=0.5,
    random_state=13579)

print(len(glaze_df_half1))
print(len(glaze_df_half2))
print(len(glaze_df))
2577
2577
5154
In [ ]:
 

Clustering

There are a lot of glazes that fall well outside the chemistry of a normal glaze, these are often 'effects glazes' that are used for a particular physical or visual quality, an example might be a metal saturate glaze that mimics the look of metal, or a crystalline glaze that forms crystals from devitrified silica. It is likely that DBSCAN will be useful however Kmeans and Gaussian Mixture Models will also be applied and compaired.

K-means Clustering¶

In [2572]:
Sum_of_squared_distances = []
Mannhattan_silhouette_scores = []
Euclidean_silhouette_scores = []
cosine_silhouette_scores = []

K = range(1,25)
for k in K:
    km = KMeans(n_clusters=k)
    km = km.fit(umap_std)
    Sum_of_squared_distances.append(km.inertia_)
    try:
        klusters = km.fit_predict(umap_std)
        Mannhattan_silhouette_scores.append(metrics.silhouette_score(umap_std, klusters, metric='manhattan'))
        Euclidean_silhouette_scores.append(metrics.silhouette_score(umap_std, klusters, metric='euclidean'))
        cosine_silhouette_scores.append(metrics.silhouette_score(umap_std, klusters, metric='cosine'))

    except:
        Mannhattan_silhouette_scores.append(1)
        Euclidean_silhouette_scores.append(1)
        cosine_silhouette_scores.append(1)
        None


plt.show()
In [2573]:
plt.figure(figsize=(15,8))  
plt.subplot(1,2,1)
plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('Number for K',fontsize=15)
plt.ylabel('Sum_of_squared_distances',fontsize=15)
plt.title('Elbow Method For Optimal k',fontsize=15)
plt.subplot(1,2,2)
labels = ['Euclidean Distance','Manhattan Distance','Cosine Distance']

plt.plot(K, Euclidean_silhouette_scores, 'bx-',color='b',label=labels[0])
plt.plot(K, Mannhattan_silhouette_scores, 'bx-',color='r',label=labels[1])
plt.plot(K, cosine_silhouette_scores,'bx-',color='g', label=labels[2])

plt.legend(fontsize=15)
plt.xlabel('Number for K',fontsize=15)
plt.ylabel('Silhouette Scores',fontsize=15)
plt.title('Silhouette Scores For Optimal k',fontsize=20)
Out[2573]:
Text(0.5, 1.0, 'Silhouette Scores For Optimal k')
In [2574]:
#Check for consistancy 

for i in range(4,11):
    n_clusters=i
    km = KMeans(n_clusters,)
    K_clusters1 = km.fit_predict(umap_std1)
    K_clusters2 = km.fit_predict(umap_std2)
    
    sil_score1 = metrics.silhouette_score(umap_std1, K_clusters1, metric='euclidean')
    sil_score2 = metrics.silhouette_score(umap_std2, K_clusters2, metric='euclidean')

    #plot the data 
    position = (i-3)*2
    plt.figure(figsize=(30,150))
    sns.set(rc={ 'figure.facecolor':'white'})

    plt.subplot(16,2,position-1)
    plt.title('UMAP')

    plt.scatter(umap_std1[:, 0],
            umap_std1[:, 1],
            c=K_clusters1 ,
            s=10,
            cmap='rainbow',
            alpha = .5)
    plt.ylabel('UMAP2',fontsize=20)
    plt.xlabel('UMAP1',fontsize=20)
    plt.title('Group 1 UMAP Projection with {} K clusters\n the silhouette score for this cluster is: {}'.format(i,sil_score1),fontsize=20)


    plt.subplot(16,2,position)
    plt.title('UMAP')

    plt.scatter(umap_std2[:, 0],
            umap_std2[:, 1],
            c=K_clusters2,
            s=10,
            cmap='rainbow',
             alpha = .5)
    plt.ylabel('UMAP2',fontsize=20)
    plt.xlabel('UMAP1',fontsize=20)
    plt.title('Group 2 UMAP Projection with {} K clusters\n the silhouette score for this cluster is: {}'.format(i,sil_score1),fontsize=20)
In [2575]:
n_clusters=5
km = KMeans(n_clusters,)
K_clusters1 = km.fit_predict(umap_std1)
K_clusters2 = km.fit_predict(umap_std2)
glaze_df_half1['K_clusters']=K_clusters1
glaze_df_half2['K_clusters']=K_clusters2

K_clusters = km.fit_predict(umap_std)

glaze_df['K_clusters']=K_clusters

sil_score = metrics.silhouette_score(umap_std, K_clusters, metric='euclidean')
if sil_score>0:
    print("{} the silhouette score is positive\n the preseence of {} clusters is likely".format(sil_score,n_clusters))

else:
    print("{} the silhouette score is negative\n the preseence of {} clusters is not likely".format(sil_score,n_clusters))
0.4298093318939209 the silhouette score is positive
 the preseence of 5 clusters is likely
In [2600]:
plt.figure(figsize=(30,20))

plt.subplot(2,2,1)
plt.title('UMAP',fontsize=15)

plt.scatter(glaze_df['UMAP1'],  glaze_df['UMAP2'],
            c=glaze_df['K_clusters'],s=10,cmap='rainbow',
            alpha = .5)
plt.ylabel('UMAP2',fontsize=15)
plt.xlabel('UMAP1',fontsize=15)
plt.title('UMAP Projection with Guassian Mixture Clusters ',fontsize=15)

plt.subplot(2,2,2)
xvar1 = glaze_df['RO_umf']
yvar1 = glaze_df['SiO2_Al2O3_ratio_umf']
plt.scatter(xvar1,yvar1,
            c=glaze_df['K_clusters'],
            cmap='rainbow',
            s=10, alpha = 0.5)
plt.ylabel('SiO2_Al2O3_ratio_umf cone 6',fontsize=15)
plt.xlabel('Alkaline Earth Oxides',fontsize=15)
plt.title('Hued for Guassian Mixture Clusters')
plt.ylim(0,40)


plt.subplot(2,2,3)
xvar1 = glaze_df['B2O3_umf']
yvar1 = glaze_df['SiO2_Al2O3_ratio_umf']
plt.scatter(xvar1,yvar1,
            c=glaze_df['K_clusters'],
            cmap='rainbow',
            s=10, alpha = 0.5)
plt.ylabel('SiO2_Al2O3_ratio_umf cone 6',fontsize=15)
plt.xlabel('Boron Oxide UMF',fontsize=15)
plt.title('Hued for Guassian Mixture Clusters')
plt.ylim(0,40)


plt.subplot(2,2,4)
xvar1 = glaze_df['colorant_oxide_sum']
yvar1 = glaze_df['SiO2_umf']
plt.scatter(xvar1,yvar1,
            c=glaze_df['K_clusters'],
            cmap='rainbow',
            s=10, alpha = 0.5)
plt.ylabel('SiO2_Al2O3_ratio_umf cone 6',fontsize=15)
plt.xlabel('Transition Metal Oxides',fontsize=15)
plt.title('Hued for Guassian Mixture Clusters')
plt.ylim(0,40)
plt.xlim(0,10)

plt.show()


labels = KMeans(n_clusters=3, random_state=123).fit_predict(X_std)
#print(metrics.silhouette_score(X_std, labels, metric='euclidean'))
num_cluster=[]
num_cluster = glaze_df['K_clusters'].value_counts().to_list()
print('\n')
for i, num in enumerate( num_cluster):
    print(' cluster {} has {} datapoints'.format(i,num))

 cluster 0 has 1271 datapoints
 cluster 1 has 1185 datapoints
 cluster 2 has 1039 datapoints
 cluster 3 has 945 datapoints
 cluster 4 has 714 datapoints

Agglomerative Clustering

In [2598]:
#Check for consistancy 

for i in range(2,10):
    # define the clustering algorithm
    num_clusters=i
    comp_aglomerative_clust = AgglomerativeClustering(linkage  = 'average',
                                            affinity  = 'cosine',
                                            n_clusters= num_clusters)
    
    agg_clusters1 = comp_aglomerative_clust.fit_predict(umap_std1)
    agg_clusters2 = comp_aglomerative_clust.fit_predict(umap_std2)
    
    sil_score1 = metrics.silhouette_score(umap_std1, agg_clusters1, metric='cosine')
    sil_score2 = metrics.silhouette_score(umap_std2, agg_clusters2, metric='cosine')

    #plot the data 
    position = (i-1)*2
    plt.figure(figsize=(30,150))
    sns.set(rc={ 'figure.facecolor':'white'})

    plt.subplot(16,2,position-1)
    plt.title('UMAP')

    plt.scatter(umap_std1[:, 0],
            umap_std1[:, 1],
            c=agg_clusters1 ,
            s=10,
            cmap='rainbow',
            alpha = .5)
    plt.ylabel('UMAP2',fontsize=20)
    plt.xlabel('UMAP1',fontsize=20)
    plt.title('Group 1 UMAP Projection with {} Agglomerative clusters\n the silhouette score for this cluster is: {}'.format(i,sil_score1),fontsize=20)


    plt.subplot(16,2,position)
    plt.title('UMAP')

    plt.scatter(umap_std2[:, 0],
            umap_std2[:, 1],
            c=agg_clusters2,
            s=10,
            cmap='rainbow',
             alpha = .5)
    plt.ylabel('UMAP2',fontsize=20)
    plt.xlabel('UMAP1',fontsize=20)
    plt.title('Group 2 UMAP Projection with {} Agglomerative clusters\n the silhouette score for this cluster is: {}'.format(i,sil_score1),fontsize=20)

The Agglomerative clustesr apear inconsistent after 6 clusters and apear in an as a pizza wedge shapes.

GMM Clustering

In [2578]:
#Check for consistancy 

for i in range(2,10):
    # define the clustering algorithm
    GMM = GaussianMixture(n_components=i,
                      covariance_type='full', #'full', 'tied', 
                      tol=0.001,
                      reg_covar=1e-06,
                      max_iter=1000,
                      n_init=1,
                      init_params='kmeans',#‘random’kmeans
                      weights_init=None,
                      means_init=None,
                      precisions_init=None,
                      random_state=None,
                      warm_start=False,
                      verbose=0,
                      verbose_interval=10)
    GMM_clusters1 = GMM.fit_predict(umap_std1)
    GMM_clusters2 = GMM.fit_predict(umap_std2)
    
    sil_score1 = metrics.silhouette_score(umap_std1, GMM_clusters1, metric='mahalanobis')
    sil_score2 = metrics.silhouette_score(umap_std2, GMM_clusters2, metric='mahalanobis')

    #plot the data 
    position = (i-1)*2
    plt.figure(figsize=(30,150))
    sns.set(rc={ 'figure.facecolor':'white'})

    plt.subplot(16,2,position-1)
    plt.title('UMAP')

    plt.scatter(umap_std1[:, 0],
            umap_std1[:, 1],
            c=GMM_clusters1 ,
            s=10,
            cmap='rainbow',
            alpha = .5)
    plt.ylabel('UMAP2',fontsize=20)
    plt.xlabel('UMAP1',fontsize=20)
    plt.title('Group 1 UMAP Projection with {} GMM clusters \n the silhouette score for this cluster is: {}'.format(i,sil_score1),fontsize=20)


    plt.subplot(16,2,position)
    plt.title('UMAP')

    plt.scatter(umap_std2[:, 0],
            umap_std2[:, 1],
            c=GMM_clusters2,
            s=10,
            cmap='rainbow',
             alpha = .5)
    plt.ylabel('UMAP2',fontsize=20)
    plt.xlabel('UMAP1',fontsize=20)
    plt.title('Group 2 UMAP Projection with {} GMM clusters \n the silhouette score for this cluster is: {}'.format(i,sil_score2),fontsize=20)

The Gaussian Mixture clusters apear inconsistent even with 2 clusters

DBSCAN

In [2579]:
#Selection for Epsilon 

#range_eps = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,]

range_eps = np.linspace(0.1,2,19)
sil_score_Epsilon_euc=[]
sil_score_Epsilon_cos=[]
sil_score_Epsilon_man=[]

for i in  range_eps:
    #print("eps value is "+str(i))
    try:
        db_euc = DBSCAN(eps=i, min_samples = 15,metric='euclidean' ).fit(X_std)
        core_samples_mask = np.zeros_like(db_euc.labels_, dtype=bool)
        core_samples_mask[db_euc.core_sample_indices_] = True
        labels_euc = db_euc.labels_
        silhouette_avg_euc = silhouette_score(X_std,labels_euc)
    except:
        silhouette_avg_euc=np.nan
    
    try:
        db_man = DBSCAN(eps=i, min_samples = 15,metric='manhattan').fit(X_std)
        core_samples_mask = np.zeros_like(db_man.labels_, dtype=bool)
        core_samples_mask[db_man.core_sample_indices_] = True
        labels_man = db_man.labels_
        silhouette_avg_man = silhouette_score(X_std,labels_man)
    except:
        silhouette_avg_man=np.nan
    
    try:
        db_cos = DBSCAN(eps=i, min_samples = 15,metric='cosine').fit(X_std)
        core_samples_mask = np.zeros_like(db_cos.labels_, dtype=bool)
        core_samples_mask[db_cos.core_sample_indices_] = True
        labels_cos = db_cos.labels_
        silhouette_avg_cos = silhouette_score(X_std,labels_cos)
    except:
        silhouette_avg_cos=np.nan
        
    
    sil_score_Epsilon_euc.append(silhouette_avg_euc)
    sil_score_Epsilon_man.append(silhouette_avg_man)
    sil_score_Epsilon_cos.append(silhouette_avg_cos)

 
In [2580]:
plt.figure(figsize=(15,8))   
plt.plot(range_eps,sil_score_Epsilon_euc, 'bx-',color='c', label='using euclidean distance ')
plt.plot(range_eps,sil_score_Epsilon_man, 'bx-',color='gold',label='using manhattan distance ')
plt.plot(range_eps,sil_score_Epsilon_cos, 'bx-',color='m',label='using cosine distance ')

plt.legend()
plt.xlabel('Epsilon Value')
plt.ylabel('Silhouette Score')

plt.xticks()
plt.show()
In [2581]:
# Defining the agglomerative clustering
#working
dbscan_cluster1 = DBSCAN(eps=1.27, min_samples=8,metric = "euclidean")#produces 3 clusters
dbscan_cluster2 = DBSCAN(eps=1.27, min_samples=8,metric = "euclidean")


#test
dbscan_clusters1 = dbscan_cluster1.fit_predict(X_std1)
dbscan_clusters2 = dbscan_cluster1.fit_predict(X_std2)

sil_score1 = metrics.silhouette_score(X_std1, dbscan_clusters1, metric='euclidean')
sil_score2 = metrics.silhouette_score(X_std2, dbscan_clusters2, metric='euclidean')


#plot the data 
plt.figure(figsize=(30,15))
sns.set(rc={ 'figure.facecolor':'white'})

plt.subplot(1,2,1)
plt.title('UMAP')

plt.scatter(umap_std1[:, 0],
        umap_std1[:, 1],
        c=dbscan_clusters1 ,
        s=10,
        cmap='rainbow',
        alpha = .5)
plt.ylabel('UMAP2',fontsize=20)
plt.xlabel('UMAP1',fontsize=20)
plt.title('Group 1 DBSCAN UMAP Projection with a silhouette score of: {} clusters'.format(sil_score1),fontsize=20)


plt.subplot(1,2,2)
plt.title('UMAP')

plt.scatter(umap_std2[:, 0],
        umap_std2[:, 1],
        c=dbscan_clusters2,
        s=10,
        cmap='rainbow',
         alpha = .5)
plt.ylabel('UMAP2',fontsize=20)
plt.xlabel('UMAP1',fontsize=20)
plt.title('Group 2 DBSCAN UMAP Projection with a silhouette score of: {} clusters'.format(sil_score2),fontsize=20)
plt.show()
print('cluster1',np.unique(dbscan_clusters1))
print('cluster2',np.unique(dbscan_clusters2))
cluster1 [-1  0  1  2]
cluster2 [-1  0  1  2  3]
In [2582]:
#DBSCAN Clustering

#working
#dbscan_cluster = DBSCAN(eps=1.27, min_samples=10,metric = "euclidean")#produces 3 clusters

dbscan_cluster = DBSCAN(eps=1.27, min_samples=16,metric = "euclidean")#produces 3 clusters

dbscan_clusters = dbscan_cluster.fit_predict(X_std)

glaze_df['dbscan_clusters']=dbscan_clusters

print('cluster ',np.unique(dbscan_clusters))
cluster  [-1  0  1  2]
In [2583]:
#plot the first two results of each reduction technique
sil_score = metrics.silhouette_score(X_std, dbscan_clusters, metric='euclidean')


print("{} is the silhouette score for the total sample".format(sil_score))
num_cluster=[]
num_cluster = glaze_df['dbscan_clusters'].value_counts().to_list()
print(glaze_df['dbscan_clusters'].value_counts())



plt.figure(figsize=(30,30))
sns.set(rc={ 'figure.facecolor':'white'})

plt.subplot(2,2,1)
plt.title('UMAP')

plt.scatter(glaze_df['UMAP1'],  glaze_df['UMAP2'],
            c=glaze_df['dbscan_clusters'],s=10,cmap='rainbow',
            alpha = .5)
plt.ylabel('UMAP2')
plt.xlabel('UMAP1')
plt.title('UMAP Projection with Group 1 Guassian Mixture Clusters ')

plt.subplot(2,2,2)
xvar1 = glaze_df['RO_umf']
yvar1 = glaze_df['SiO2_Al2O3_ratio_umf']
plt.scatter(xvar1,yvar1,
            c=glaze_df['dbscan_clusters'],
            cmap='rainbow',
            s=10, alpha = 0.5)
plt.ylabel('SiO2_Al2O3_ratio_umf cone 6')
plt.xlabel('Alkaline Earth Oxides')
plt.title('all')
plt.ylim(0,40)


plt.subplot(2,2,3)
xvar1 = glaze_df['B2O3_umf']
yvar1 = glaze_df['SiO2_Al2O3_ratio_umf']
plt.scatter(xvar1,yvar1,
            c=glaze_df['dbscan_clusters'],
            cmap='rainbow',
            s=10, alpha = 0.5)
plt.ylabel('SiO2_Al2O3_ratio_umf cone 6')
plt.xlabel('Boron Oxide UMF')
plt.title('all')
plt.ylim(0,40)


plt.subplot(2,2,4)
xvar1 = glaze_df['colorant_oxide_sum']
yvar1 = glaze_df['SiO2_umf']
plt.scatter(xvar1,yvar1,
            c=glaze_df['dbscan_clusters'],
            cmap='rainbow',
            s=10, alpha = 0.5)
plt.ylabel('SiO2_Al2O3_ratio_umf cone 6')
plt.xlabel('Transition Metal Oxides')
plt.title('all')
plt.ylim(0,15)
plt.xlim(0,2)

plt.show()
0.15506506145031015 is the silhouette score for the total sample
 0    4254
-1     809
 1      76
 2      15
Name: dbscan_clusters, dtype: int64

Explore Clusters

The only clusters that gave consistent results where the K means and DBSCAN clusters, they are compaired in plots below

In [ ]:
 
In [2584]:
 cluster_list=[['K_clusters','rainbow'],
               ['dbscan_clusters','rainbow'],
               ['sheen_value','viridis']]
plt.figure(figsize=(20,5))

for i, cluster in enumerate(cluster_list):
    plt.subplot(1,4,i+1)
    xvar1 = glaze_df['RO_umf']
    yvar1 = glaze_df['SiO2_Al2O3_ratio_umf']
    hue = glaze_df[cluster[0]]
    color_map = cluster[1]
    plt.scatter(xvar1,yvar1,
            c=hue,
            cmap=color_map,
            s=10, 
            alpha = 0.2)
    plt.ylabel('SiO2_Al2O3_ratio_umf cone 6')
    plt.xlabel('RO2_umf')
    plt.title(cluster[0])
    plt.ylim(0,20)
    #plt.xlim(-1,3)
In [2585]:
cluster_list=[ ['K_clusters','rainbow'],
               ['dbscan_clusters','rainbow'],
               ['to_Degrees_C','plasma']]
plt.figure(figsize=(20,5))

for i, cluster in enumerate(cluster_list):
    plt.subplot(1,4,i+1)
    xvar1 = glaze_df['B2O3_umf']
    yvar1 = glaze_df['SiO2_umf']
    hue = glaze_df[cluster[0]]
    color_map = cluster[1]
    plt.scatter(xvar1,yvar1,
            c=hue,
            cmap=color_map,
            s=10, alpha = 0.5)
    plt.ylabel('SiO2_umf')
    plt.xlabel('B2O3_umf')
    plt.title(cluster[0])
    plt.ylim(0,20)
    plt.xlim(-.1,1.5)
In [2586]:
cluster_list=[['K_clusters','rainbow'],
               ['dbscan_clusters','rainbow'],
               ['sum_color','cool']]
plt.figure(figsize=(20,5))

for i, cluster in enumerate(cluster_list):
    plt.subplot(1,4,i+1)
    xvar1 = glaze_df['colorant_oxide_sum']
    yvar1 = glaze_df['SiO2_Al2O3_ratio_umf']
    hue = glaze_df[cluster[0]]
    color_map = cluster[1]
    plt.scatter(xvar1,yvar1,
            c=hue,
            cmap=color_map,
            s=10, alpha = 0.5)
    plt.ylabel('SiO2_Al2O3_ratio_umf cone 6')
    plt.xlabel('colorant_oxide_sum')
    plt.title(cluster[0])
    plt.ylim(0,20)
    plt.xlim(-.1,2)
In [2587]:
# explore clusters by temperature and sheen 
plt.figure(figsize=(20,20))
plt.subplot(3,1,1)
sns.violinplot(x="dbscan_clusters", y="sheen_value", data=glaze_df)
plt.subplot(3,1,2)
sns.violinplot(x="dbscan_clusters", y="to_Degrees_C", data=glaze_df)
plt.subplot(3,1,3)
sns.violinplot(x="dbscan_clusters", y="sum_color", data=glaze_df)
Out[2587]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a71947090>
In [2588]:
cross_examin = pd.crosstab(columns = glaze_df.material_type,index =dbscan_clusters)



plt.figure(figsize=(40,10))
sns.set(rc={ 'figure.facecolor':'tab:gray'})

sns.heatmap(cross_examin, annot=True,) 
plt.title("P-value Matrix") 
plt.show()

plt.show()
In [2589]:
 glaze_df[['name','material_type',
          'surface_type','to_orton_cone',
          'transparency_type','temp_catagory',
          'is_glossy','dbscan_clusters',
          'RO_umf','R2O_umf', 'B2O3_umf']].loc[dbscan_clusters==0].head()
Out[2589]:
name material_type surface_type to_orton_cone transparency_type temp_catagory is_glossy dbscan_clusters RO_umf R2O_umf B2O3_umf
0 Base Glaze Peltzman Glaze Glossy 8 Transparent Mid 1 0 0.7801 0.2199 0.0000
2 Celadon-type glaze David Pier Celadon Glossy 11 Transparent High 1 0 0.8290 0.1710 0.0000
3 Pier's Pure Lux-Deluxe Revised Clear Glossy 9 Transparent Mid 1 0 0.7751 0.2249 0.1224
4 Blue Acero Blue Satin - Matte 11 Semi-opaque High 0 0 0.7543 0.2457 0.0000
5 #207.3 Clear Glossy 8 NaN Mid 1 0 0.7964 0.2036 0.0000
In [ ]:
 
In [2590]:
# explore clusters by temperature and sheen 
plt.figure(figsize=(40,40))
plt.subplot(3,1,1)
sns.violinplot(x="K_clusters", y="sheen_value", data=glaze_df)
plt.subplot(3,1,2)
sns.violinplot(x="K_clusters", y="to_Degrees_C", data=glaze_df)
plt.subplot(3,1,3)
sns.violinplot(x="K_clusters", y="sum_color", data=glaze_df)

plt.show()
In [2591]:
# compare cluster labels with material type labels
cross_examin = pd.crosstab(columns = glaze_df.material_type,index =K_clusters)

plt.figure(figsize=(40,10))
sns.set(rc={ 'figure.facecolor':'tab:gray'})
sns.heatmap(cross_examin, annot=True, cmap="YlGnBu") 
plt.title("P-value Matrix") 
plt.show()

plt.show()
In [2592]:
# compare cluster labels with material type labels
cross_examin = pd.crosstab(columns = glaze_df.sheen_value,index =glaze_df.K_clusters)

plt.figure(figsize=(40,10))
sns.set(rc={ 'figure.facecolor':'tab:gray'})
sns.heatmap(cross_examin, annot=True, cmap="YlGnBu") 
plt.title("P-value Matrix") 
plt.show()

plt.show()
In [2593]:
 glaze_df[['name','material_type',
          'surface_type','to_orton_cone',
          'transparency_type','temp_catagory',
          'is_glossy','dbscan_clusters',
          'K_clusters']].loc[K_clusters==0].tail(40)
Out[2593]:
name material_type surface_type to_orton_cone transparency_type temp_catagory is_glossy dbscan_clusters K_clusters
6518 VC/Easy Glossy + Copper Carbonate 1.5, RIO 0.5 Copper Glossy 6 Transparent Mid 1 0 0
6519 VC/Easy Glossy + Copper Carbonate 1, RIO 1 Copper Glossy 6 Transparent Mid 1 0 0
6520 VC/Easy Glossy + Copper Carbonate 0.5, RIO 1.5 Copper Glossy 6 Transparent Mid 1 0 0
6521 VC/Easy Glossy + RIO 2 Copper Glossy 6 Transparent Mid 1 0 0
6523 Gail Kendall cone 04 clear Clear Glossy 04 Transparent Low 1 -1 0
6545 Blooming Blue Glaze Glossy 6 Opaque Mid 1 0 0
6593 Starry night Glaze Glossy 6 NaN Mid 1 0 0
6598 Floating Blue Cobalt Glossy 6 Opaque Mid 1 0 0
6680 WPG01 Shiny White White, Off-White Glossy 6 Opaque Mid 1 0 0
6736 Licorice base + 5 Rio, 1 Titanium Diox. Amber Glossy - Semi 6 Semi-opaque Mid 0 -1 0
6755 WPG13 Olive Moss Rutile Glossy 6 NaN Mid 1 0 0
6816 P.V. Liner Clear Glossy 6 Transparent Mid 1 0 0
7041 Laurel Rutile Glossy - Semi 04 Translucent Low 0 -1 0
7044 Amber Nickel Glossy - Semi 04 Translucent Low 0 -1 0
7045 Aqua Turquoise Glossy 04 Opaque Low 1 -1 0
7052 Clear (ArtAlliance) Clear Glossy 5 Transparent Mid 1 0 0
7125 Ben Fiess Cone 2 Red Crust (MFG0010B) Crawling Matte 2 Opaque Low 0 0 0
7132 HTB 46A Clear Glossy 8 Transparent Mid 1 -1 0
7154 Rowe's Brown Glaze Glossy 6 Translucent Mid 1 2 0
7161 Amber Fine Sparkle Amber Glossy 03 Translucent Low 1 -1 0
7164 HTB 42 Clear Glossy 6 Translucent Mid 1 2 0
7167 HMB49 Clear Glossy 6 Semi-opaque Mid 1 0 0
7242 Transparent - NCMC Glaze Glossy - Semi 6 Transparent Mid 0 0 0
7307 Murfitt Metallic Glaze Metallic Matte - Semi 6 Opaque Mid 0 -1 0
7318 Lowfire Glaze 3124 (White) White, Off-White Glossy 02 Semi-opaque Low 1 -1 0
7368 GREY- Floating Blue Base Specialty Glossy 7 Opaque Mid 1 0 0
7401 Metallic Bronze (Copy) Metallic Matte - Smooth 6 Opaque Mid 0 -1 0
7409 Transparente Clear Matte - Semi 7 NaN Mid 0 -1 0
7421 Murfitt Metallic Glaze (Copy) (Copy) Metallic Matte - Semi 6 Opaque Mid 0 -1 0
7454 Antiquity Bronze NG Metallic Satin 6 Opaque Mid 0 -1 0
7503 DeBoos 3 White, Off-White Glossy 03 Opaque Low 1 -1 0
7613 White Peak Earthenware Matte 04 Opaque Low 0 -1 0
7700 Batz Mod Clear Glossy - Semi 03 Transparent Low 0 -1 0
7739 Broken Celadon (From Amazing Glaze) Glaze Glossy - Semi 10 Semi-opaque Mid 0 -1 0
7800 Waterfall Brown Iron Satin 6 Opaque Mid 0 -1 0
7820 Lichen crawl Glaze Matte - Semi 04 NaN Low 0 -1 0
7826 Floating Blue 1 (Midrange Glazes) Blue Glossy - Semi 7 Opaque Mid 0 0 0
7849 Waterfall Brown MC6 Iron Satin 6 Opaque Mid 0 -1 0
7896 Icy Blue Blue Glossy - Semi 03 Transparent Low 0 -1 0
7910 Blue Raspberry Glaze Matte 06 Opaque Low 0 -1 0
In [ ]:

In [2594]:
#'material_type' 'to_orton_cone','temp_catagory'''transparency_type'''
glaze_df['to_orton_cone'].loc[K_clusters==2].value_counts()
Out[2594]:
6     612
10    286
7      84
8      76
9      66
5      51
11     43
12     19
13     11
3       6
03      4
1       4
4       2
2       2
04      2
02      1
05      1
14      1
Name: to_orton_cone, dtype: int64
In [2595]:
glaze_df[['rgb_r','rgb_g',
          'rgb_b','is_glossy','sheen_value',
          'SiO2_Al2O3_ratio_umf',
          'R2O_umf','RO_umf',
          'to_Degrees_C',
          'colorant_oxide_sum',
          'opacity_value', 'B2O3_umf',
          'K_clusters']].loc[K_clusters==2].describe()
Out[2595]:
rgb_r rgb_g rgb_b is_glossy sheen_value SiO2_Al2O3_ratio_umf R2O_umf RO_umf to_Degrees_C colorant_oxide_sum opacity_value B2O3_umf K_clusters
count 796.000000 796.000000 796.000000 1271.000000 1271.000000 1271.000000 1271.000000 1271.000000 1271.000000 1271.000000 893.000000 1271.000000 1271.0
mean 114.501256 121.419598 118.923367 0.130606 4.867034 6.586665 0.403060 0.596940 1207.551534 0.397616 1.346025 0.045316 2.0
std 100.325314 102.166973 102.923861 0.337101 2.254048 3.117412 0.249437 0.249437 36.780541 4.657758 0.544107 0.077234 0.0
min 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1021.000000 0.000000 1.000000 0.000000 2.0
25% 0.000000 0.000000 3.000000 0.000000 3.000000 4.703200 0.207650 0.460950 1185.000000 0.007300 1.000000 0.000000 2.0
50% 102.000000 117.000000 114.500000 0.000000 5.000000 5.498600 0.328200 0.671800 1185.000000 0.046700 1.000000 0.000000 2.0
75% 219.000000 237.000000 240.250000 0.000000 7.000000 7.864800 0.539050 0.792350 1251.000000 0.120750 2.000000 0.075800 2.0
max 255.000000 255.000000 255.000000 1.000000 9.000000 30.956200 1.000000 1.000000 1351.000000 113.950000 3.000000 0.519000 2.0

Results

The variables used in clustering were selected based on their contribution to recipes. The bulk of most glazes are made of silica and alumina. In addition boron, alkali, and alkaline earth oxides are added to enhance melting. Colorant oxides were grouped and summated, along with opacifying oxides. The final firing temperature in degrees C was used as well. All variables were standardized and a UMAP projection was conducted for visualizations and for use with clustering. The results from clustering suggest utilizing two methods for different purposes: DBSCAN using the standardized variables and Kmeans using the UMAP projections.

In [2599]:
from IPython.display import Image
Image(url= "https://s3.amazonaws.com/glazy.org/public/uploads/recipes/68/l_21968.5b8de516bcb35.jpg")
Out[2599]:

The above image is a glaze titled 'Tichane Triaxial Celadon (Blue)' in cluster 0 for DBSCAN

DBSCAN

DBSCAN formed 3 weak clusters and a large group of outliers. The algorithm used the Standardized variables and not the UMAP projections and it generated a silhouette score of 0.22, it was quite successful at separating anomalies. The first cluster is populated with typical glazes close to a 0.7 R2O to 0.3 RO flux ratios, and common ratios of silica to alumina. The glazes that clustered in the first group would fall into the realm of durable glazes for functional surfaces such as tile, dinner, and sanitation-ware. The second cluster formed around Shino glazes (also known as carbon trap glazes) a type of Japanese glaze that is high in alkali metal flux and alumina. These glazes are commonly used in wood kilns (kilns where wood is the fuel source). These glazes can be problematic when applied over top of other glazes. The third cluster contained mid-range glazes with alkaline earth oxides RO_umf ranging from 0.80-0.87 and a large amount of boron ~0.7UMF. Many of the glazes in this cluster are recognized for forming what is known as a floating blue effect, these are small smokey blue opacities that form in thicker parts of the glaze resulting in a variegated appearance. Glazes that were separated as outliers fell into the realm of 'effects glazes', these glazes usually fall well outside the chemistry of a typical glaze, and are formulated to create a certain visual effect such as crystals, sparkle, or metallic luster. These glazes are prone to issues such as metal leaching, crazing, dunting, and shivering, and efflorescence. This makes DBSCAN useful for making suggestions to individuals and companies producing work that has to function and want to create a glaze palette that would not fail their customers.

In [2597]:
from IPython.display import Image
Image(url= "https://s3.amazonaws.com/glazy.org/public/uploads/recipes/25/l_36525.5cca35070fb88.jpg")
Out[2597]:

Above is an example of a metallic saturate glaze found in the outlier category (-1) from DBSCAN. It is listed as toxic on the website.

Kmeans

By using K-means on the UMAP projections, 5 clusters were formed with a silhouette score of 0.429. This was one of the more consistent clustering methods, however, the strength of the clusters is still quite weak.

  1. k cluster0 contains mid-range glazes(Orton cone 4-8) that fire to ~1200C,
  2. K-cluster contains mid-range glazes(Orton cone 4-8) that fire to ~1200C, many of the glazes are dark in color, and a large majority of the glazes are glossy with a small number of semi-gloss glazes.
  3. Low fire glazes with some mid-range(Orton cone 04-8) glazes present, these glazes are highly variable in regards to most other properties
  4. Contains both high fire and mid-range glazes(Orton cone 4-14) that fire to ~1200C and has the largest number of matte glazes for any of the clusters.
  5. Contains primarily high fire glazes most of which are glossy with a high variation in color.

K-means were useful in its ability to create several small weak clusters with the UMAP projections. The K-means clusters were able to separate most glazes by firing temperature range.

In [ ]:
 

Issues Around Data

Ultimately this dataset was quite difficult to cluster, nearly all clusters formed had silhouette scores below 0.5. This was due to several factors. Several permutations of variables, dimensionality reduction, clustering methods, and hyperparameters were used in an effort to form consistent well-differentiated clusters. Sparsity was a large problem with this dataset, glazes often employ different metal oxides for the same purpose, for example, a glaze could have a mole to mole substitution of lithium for sodium with little change in color, quality, or physical characteristics. Many of the metal oxides used as colorants for glazes are used in small amounts and can vary widely in how they alter the glaze and the strength of the color. These attributes of metal oxides have led to a large number of zero values for each oxide. The distribution of the data was problematic as well, the dataset had many outliers, this is because most glazes for use on functional surfaces deviate very little from a specific formula, most glazes don't deviate far from a silica to alumina ratio of 7:1 and an RO to R2O flux ratio of 7:2, in addition very few glazes have a high amount of colorant metals in them usually under 10% for added color. This dataset also includes glazes for special effects, these 'effects glazes' often have chemistries that are on the edge of or well outside of the chemical makeup of most glazes. Overlap was also a problem with much of the data. many glazes are only altered by changing one ingredient such as a colorant or metal flux, this results in the overall chemistry of the glaze remaing the same. For example alumina can alter the color and or physical qualities of the glaze quite drastically all the while maintaining the same values for the other variables.

Conclusions

Ultimately this dataset was quite difficult to cluster, DBSCAN could prove useful to a company producing ceramic objects that want to have a better idea of risk associated with the glaze they are using. K- means could be considered useful for selecting glazes for a someone unfamiliar with glaze chemistry, however, it wouldn't provide any more information than someone with experience in ceramics wouldn’t already know.

Further Experimentation

Several avenues of further experimentation and data collection became apparent when working with this dataset.

In regards to data collection, the currently available color data really only helps in differentiating dark and light glazes. There also isn’t any information on how the color data was collected. Glazes tend to show a high level of subtle variation in color, something that this dataset clearly doesn't capture. One way to do this would be to take images of example glaze tiles found on the web site and sample them for color variation. This would likely require a neural network to define the boundaries of the tile in the image, along with a way to correct variation from lighting methods, white balance, and camera types.

To further improve clusters, it might be possible to use some sort of ensemble method for clustering glazes, a method where DBSCAN outliers are clustered separately of glazes that fall inside the dense clusters of the dataset. This might be able to improve clustering since the distances wouldn’t vary as much. The high overlap, wide variance, and high dimensionality of the dataset suggests it might be better for some sort of regression analysis to predict qualities. A regression analysis of melting temperature, color, opacity, and surface could be a useful tool for someone trying to select traits of a glaze or for someone who is trying to substitute materials. This would likely require information about the raw ingredients used.

In [ ]: